Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Info | ||||
---|---|---|---|---|
| ||||
This section describes basic approaches for domain estimation and a typical notational framework.
|
Info | ||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||||||||
From direct estimation to small area modelsThe options for the estimation of domain indicators are extensive. Traditionally, national statistical offices prefer design-based approaches to estimate population indicators since no model assumptions are required for valid inferences and properties considering the survey design are known, such as design-unbiasedness. These include estimators based on the application of weights to the survey sample units belonging to the domains or small areas and model-assisted estimators, as the generalized regression estimator (see Cassel et al. 1976, Särndal et al. 1992 ). Both options are assumed to lead to large variances when sample sizes are not large enough in the domains of interest. Assuming that a large domain contains the small domains of interest (subdomains) and all having the same characteristics, a reliable direct estimator for the large domain could be used to obtain indirect estimates for the smaller domains (subdomains) of interests. These so-called synthetic estimators produce predictions with low variability but possibly large biases when the assumption of the same distribution of relevant characteristics in all domains is not fulfilled. A combination of both, design–based and synthetic approaches, aims to balance the possible bias and variability (also known as composite estimates). Among others, ESSnetSAE (2012) proposes to start with a triplet of estimates:
The simpler approaches can be used if the results are sufficient, e.g., if the CV is below the required threshold. For example, coefficients of variation should not exceed 15% for domains and 18% for small domains at the Italian National Institute of Statistics (ISTAT) and Statistics Canada uses three categories of reliability for the Labor Force Survey: no release restriction for a CV ≤ 16.5% , added warning when 16.5% < CV≤ 33.3% and otherwise, the data is not recommended for release. A good overview of different practices on how to define precision requirements is the Handbook on precision requirements and variance estimation for ESS households surveys. If the obtained estimates are not reliable, they can still be useful for comparison with results obtained from more complex model-based approaches. Another reason for a simpler approach could be the requirement to produce a large number of indicators in a timely manner and the overall capacity.
Small area estimation modelsIn these guidelines, the focus is on small area estimation models. These help to obtain predictors (estimators) with a lower variability at domain level, and with a possible bias that tends to be moderate, if the model is appropriate. The models can be summarized as mixed models with random domain-specific effects accounting for variation between domains that is not explained by auxiliary information. While SAE models may be different in their specifications, they are built on the same notational framework. A finite population of size is partitioned into domains of sizes , where refers to a the ith domain and to the jth household/individual. A random sample of size is drawn from this population which leads to observations in each domain. If , the domain is not in the sample. The wide range of small area estimation models can roughly be classified into two model types:
The basic area- and unit-level models are also known as Fay-Herriot (Fay and Herriot 1979) and Battese-Harter-Fuller (Battese et al. 1988) model, respectively.
|
Info | ||
---|---|---|
| ||
Chapter 2 of the Guidelines on small area estimation for city statistics and other functional geographies provided by Eurostat offers a nice overview of standard terms and definitions in small area estimation. |
Info | ||
---|---|---|
| ||
The guidelines provided by Molina give an extensive overview of the advantages and disadvantages of the most common small area estimation models (in Spanish). |
Info | ||
---|---|---|
| ||
The Pros and Cons apply to the standard models. There are several extensions and adjustments in the literature of small area estimation that address some of the issues. Some examples Auto-benchmarking for the Fay-Herriot area-level model: Pseudo-EBLUP to consider sampling weights: |
Info | ||
---|---|---|
| ||
ReferencesSärndal, C. E., Swensson, B., & Wretman, J. (1992). Model Assisted Survey Sampling. Springer-Verlag. |