- Created by Ann-Kristin Kreutzmann, last modified by Haoyi Chen on Feb 10, 2022
This section describes basic approaches for domain estimation and a typical notational framework.
From direct estimation to small area models
The options for the estimation of domain indicators are extensive. Traditionally, national statistical offices prefer design-based approaches to estimate population indicators since no model assumptions are required for valid inferences and properties considering the survey design are known, such as design-unbiasedness. These include estimators based on the application of weights to the survey sample units belonging to the domains or small areas and model-assisted estimators, as the generalized regression estimator (see Cassel et al. 1976, Särndal et al. 1992 ). Both options are assumed to lead to large variances when sample sizes are not large enough in the domains of interest. Assuming that a large domain contains the small domains of interest (subdomains) and all having the same characteristics, a reliable direct estimator for the large domain could be used to obtain indirect estimates for the smaller domains (subdomains) of interests. These so-called synthetic estimators produce predictions with low variability but possibly large biases when the assumption of the same distribution of relevant characteristics in all domains is not fulfilled. A combination of both, design–based and synthetic approaches, aims to balance the possible bias and variability (also known as composite estimates).
Among others, ESSnetSAE (2012) proposes to start with a triplet of estimates:
- direct,
- synthetic, and
- composite estimates.
The simpler approaches can be used if the results are sufficient, e.g., if the CV is below the required threshold. For example, coefficients of variation should not exceed 15% for domains and 18% for small domains at the Italian National Institute of Statistics (ISTAT) and Statistics Canada uses three categories of reliability for the Labor Force Survey: no release restriction for a CV ≤ 16.5% , added warning when 16.5% < CV≤ 33.3% and otherwise, the data is not recommended for release. A good overview of different practices on how to define precision requirements is the Handbook on precision requirements and variance estimation for ESS households surveys. If the obtained estimates are not reliable, they can still be useful for comparison with results obtained from more complex model-based approaches. Another reason for a simpler approach could be the requirement to produce a large number of indicators in a timely manner and the overall capacity.
ESTIMATORS | PROS | CONS |
---|---|---|
Direct |
|
|
Synthetic |
|
|
Composite |
|
|
The example is described in detail in Bertolini Coelho et al. (2020).
Goal: A department of the Brazilian Network Information Center (NIC.br) called Regional Center for Studies on the Development of the Information Society (Cetic.br) collects data about access and use of information and communication technologies (ICT) in Brazil. Data users are interested in timely publication of main ICT indicators for the 27 Brazilian states.
Indicator of interest: Proportion of households with computers, and proportion of households with Internet access (which is similar to 17.8.1 Proportion of individuals using the Internet).
Disaggregation dimension: Brazilian states.
Data availability: The annual Survey on the Use of ICT in Brazilian Households contains almost 33,000 households. Reliable estimates for the five larger regions North, Northeast, Southeast, South and Center-West can be produced.
SAE methods: Average of consecutive years, pooling samples of consecutive years, a single-year composite estimator considering the regions as yielding synthetic estimates; and a composite estimator based on pooling samples from two consecutive years, and using the regions as yielding synthetic estimates. The simpler approaches are chosen due to a wide range of indicators that need to be produced in a timely manner after data collection.
Small area estimation models
In these guidelines, the focus is on small area estimation models. These help to obtain predictors (estimators) with a lower variability at domain level, and with a possible bias that tends to be moderate, if the model is appropriate. The models can be summarized as mixed models with random domain-specific effects accounting for variation between domains that is not explained by auxiliary information.
While SAE models may be different in their specifications, they are built on the same notational framework. A finite population of size is partitioned into domains of sizes , where refers to the ith domain and to the jth household/individual. A random sample of size is drawn from this population which leads to observations in each domain. If , the domain is not in the sample.
The wide range of small area estimation models can roughly be classified into two model types:
- Area-level models relate a domain indicator with domain-specific auxiliary information.
- Unit-level models use the unit-level survey data for fitting a model and unit-level auxiliary information for producing estimates in all domains.
The basic area- and unit-level models are also known as Fay-Herriot (Fay and Herriot 1979) and Battese-Harter-Fuller (Battese et al. 1988) model, respectively.
MODEL CHARACTERISTICS | MODEL TYPES | |
---|---|---|
AREA-LEVEL | UNIT-LEVEL | |
PROS |
|
|
CONS |
|
|
Terms and definitions
Chapter 2 of the Guidelines on small area estimation for city statistics and other functional geographies provided by Eurostat offers a nice overview of standard terms and definitions in small area estimation.
Pros and Cons
The guidelines provided by Molina give an extensive overview of the advantages and disadvantages of the most common small area estimation models (in Spanish).
Extended/Adjusted models
The Pros and Cons apply to the standard models. There are several extensions and adjustments in the literature of small area estimation that address some of the issues.
Some examples
Auto-benchmarking for the Fay-Herriot area-level model:
Pseudo-EBLUP to consider sampling weights:
References
Särndal, C. E., Swensson, B., & Wretman, J. (1992). Model Assisted Survey Sampling. Springer-Verlag.
- No labels