Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Office for National Statistics

https://www.ons.gov.uk/

SAE motivation

Data on income at the smallest possible geographical level helps with identification of communities that have large levels of deprivation or inequality, and as such can be used to support work addressing issues of inequality.  To gain this information, we need answers to some questions such as the following:

What is the average household disposable income in different local areas in England and Wales? 

Where are the highest and lowest income areas located? 

What is the average income adjusted for housing costs in different areas?

Including questions on income information in the England and Wales Census has been considered regularly. However, due to concerns about the public acceptability of asking about their income, it has never been included incorporated in the Census. This concern was confirmed during a Census test conducted in 2007 that included contained a question on income, leading to lower response rates. Instead, small area estimation methods have been implemented to produce official income estimates at the small area level for England and Wales.


Anchor
UKIndicators
UKIndicators
Indicators in the scope of the study and Input data

The Family Resources Survey (FRS) is the largest survey available that collects information on household income. It covers around 19,000 to 20,000 private UK households and through questionnaires and interviews asks responders questions on income and its components, including benefits, tax pensions and tax credits. Based on FRS response, four different measures of household income are derived for use in modelling:   

  • Total household weekly income (unequivalised):

This is the sum of the gross income of every member of the household, that is, wages and salaries, self-employment, pensions, investments, plus any income from benefits, such as Working Families Tax Credits. 

  • Net household weekly income (unequivalised):  

This is calculated as total household weekly income but is net of income tax payments, national insurance contributions, domestic rates/council tax, contributions to occupational

...

pension schemes, all maintenance and child support payments, which are deducted from the person making the payments, and parental contribution to students living away from home.

...

  

  • Net household weekly income before housing cost (equivalised)

This is calculated in the same way as net household weekly income but with the application of OECD’s equivalisation scale to adjust the household income values to represent the income level of every individual in the household. 

  • Net household weekly income after housing costs (equivalised):  

This is calculated in the same way as equivalised net household weekly income before housing cost but is subject to the deduction of rent, water rates, community water charges and council water charges, mortgage interest payments, structural insurance premiums (for owner occupiers), ground rent and service charges. 

All variables above are obtained from FRS, however net equivalised household weekly income before and after housing costs is defined and calculated based on Households Below Average Income (HBAI) methodology. For further detail on FRS or HBAI please see a note on Family Resources Survey: methodology and background notes.

The following additional data sources were available during the model selection process, but different years’ estimates included a different subset in the final estimation model due to availability and model selection:

  • Census
  • Department for Work and Pensions benefit claimant counts
  • Valuation Office Agency (VOA) Council Tax Bandings
  • Office for National Statistics, House Price Statistics for Small Areas
  • Department of Energy and Climate Change, Energy Consumption data
  • Her Majesty’s Revenue and Customs, Pay As You Earn (PAYE) data
  • Regional or country identification variable

Many of these sources are administrative data sources that have not been collected for statistical purposes and thus have differing coverage and definitions, especially compared with the FRS. Nevertheless, they are thought to include relevant information associated with income and could be potentially useful predictors for estimates at an area level. The list of alternative data sources available for the potential inclusion into the model is growing with more data sharing across departments. For example, HMRC self-assessment data for self-employment income is being considered for the inclusion into estimates for ONS’ 2022 publication.

Small area estimation modelling techniques enable us to build models that take their strengths from survey, census and administrative data sources. The final dataset for modelling is a combination of different data sources with survey household records linked to auxiliary data from census and administrative data, available at area level, using postcode variable as a match-key. 

Data challenges

The survey is designed to provide direct estimates of good quality at national and regional levels, but no levels lower than that. The sample is too small to provide reliable direct estimates for small areas. As such, model-based methods are required for small areas. These are based on model parameters and values for the covariate (auxiliary) data. Furthermore, only 14% of areas (postcode sectors) were sampled in the latest income estimates (tax year ending 2018). The Great Britain FRS uses a stratified clustered probability method and samples 1,417 postcode sectors from 9,200. Using sampling techniques may lead to bias or sampling error. The small area modelling approaches include random effects to account for the clustering.

SAE model building

SAE methods/Specification 

The first stage is to determine independent variables from census and administrative sources that are relevant to each of the four variables of interest (dependent variables). Forward and backward selection is used to help identify significant covariates to be included, with region/country indicator terms forced into the model. Selected covariates and two-way interactions are retained for estimation. This step is rerun for each new publication to ensure the relevance of data sources over time. However, changes in the covariate selection over time limit conclusions on associated estimates of change as any observed changes could be attributed to changes in covariates over time to some degree.

Once covariates have been determined for each dependent variable of interest, multilevel linear models were used to produce Middle Layer Super Output Area (MSOA) level estimates of mean household income for four dependent variables: average weekly gross and net household income, average weekly net household income (equivalised), and average weekly equivalised net household income after housing costs. Weekly household income was used as the dependent variable and the area level covariates as explanatory variables. The models relate the household-level survey variable to the covariates that relate to the small area where that household is located. Although the model outputs provide MSOA level mean estimates, model inputs are at household level.Non-linear estimates, like median and percentile, cannot currently be derived in this way. The MSOAs discussed in this report are based on Census data from 2011,and they have a mean population of 7200 and minimum of 5000.



Statistical software

The multilevel models were fitted using SAS, which was the office standard software at the time of the initial model development. Development of a pipeline to convert the code from SAS to R is currently being explored.

...