The R User Conference 2016

June 27 - June 30 2016
Stanford University, Stanford, California

Small Area Estimation with R

Virgilio Gómez Rubio - University of Castilla-La Mancha

Post-tutorial notes

The materials used in the tutorial are available here.

Tutorial Description

The tutorial will introduce different types of statistical methods for the analysis of survey data to produce estimates for small domains (sometimes termed ‘small areas’). This will include design-based estimators, that are only based on the study design and observed data, and model-based estimators, that rely on an underlying model to provide estimates. The tutorial will cover frequentist and Bayesian inference for Small Area Estimation. All methods will be accompanied by several examples that attendants will be able to reproduce.

This tutorial will be roughly based on the tutorial presented at useR! 2008 but will include updated materials. In particular, it will cover new R packages that have appeared since then.


The tutorial aims at providing the following:

  1. Brief introduction to R packages for the analysis of survey data
  2. Summary of the main packages in the ‘Official Statistics and Survey Methodology’ Task View
  3. Design-based estimation (with examples)
  4. Model-based estimation (with examples), using frequentist and Bayesian inference
  5. Visualization of small area estimates

Tutorial Outline

The tutorial will use some case studies to present the main statistical methods in Small Area Estimation. This includes:

  1. Short introduction to survey sampling strategies with R: simple random sampling, systematic sampling, clustered sampling, two-stage sampling
  2. Design based-estimators: Horvitz-Thompson, generalized regression (GREG) and calibration estimators
  3. Model-based estimators: Fay-Herriott estimator, linear regression for area and unit level models
  4. Synthetic and composite estimators
  5. Mixed-effects models: area and unit level models with random effects, EBLUP estimators
  6. Models with spatial random effects: spatial EBLUP for area and unit level models
  7. Bayesian inference for Small Area Estimation: area and unit level models
  8. Non-linear models: disease mapping, estimation of unemployment

The tutorial will be split into six 30-minute blocks. Each block will provide a short introduction to the statistical methodology and show examples with R using well known datasets.

Materials will include lecture notes, datasets (in an appropriate format) and R code used to run the examples in the tutorial. Attendants will be encouraged to inspect and run the code for the examples.

Background Knowledge

General knowledge on survey data analysis and mixed-effects models would be useful. Attendants are assumed to be R users.

Instructor Biography

Virgilio Gómez-Rubio is lecturer in the Department of Mathematics, University of Castilla-La Mancha in Albacete (Spain). He is co-author of Springer’s useR! series book Applied Spatial Data Analysis with R, 2nd ed. He has been Chair of the Local Organizing Committee of useR! 2013. He has also taught tutorials on ‘Applied Spatial Data Analysis with R’ at useR! 2010, useR! 2014 and useR! 2015, and on ‘Small Area Estimation’ at useR! 2008 in Dortmund (Germany). Virgilio has been involved as co-mentor and program co-administrator for the R Project for Statistical Computing in ‘Google Summer of Code’ for several years. He has also been supporting the use of R in Spain and has been a member of the local organizing committee of the Spanish R Users Meeting since 2011.

He has been involved in research projects on Small Area Estimation as a researcher at Imperial College London where he collaborated with the Office for National Statistics and developed some of the early software for Small Area Estimation with R. In addition, Virgilio has taught courses on Small Area Estimation at the Office for National Statistics (Titchfield, United Kingdom) and Ipsos Mori (London, U.K.), a leading research market company.

Virgilio often teaches courses on spatial data analysis with R at conferences and universities by invitation. He maintains several CRAN packages (DCluster, RArcInfo, INLABMA, spatialkernel and RGIFT) and has contributed to many others (sp, spdep, maptools).

Currently, he is preparing (jointly with Matthias Templ) a book on ‘Small Area Estimation with R’ that will be published by Wiley.

Back to Top ↑