B.2.  Purpose and description

10.6.        In the modern world, more and more data are automatically generated through a variety of devices, including mobile phones and sensors, and via many computer applications. The amount of data and the frequency at which they are produced have led to the concept of big data. The following are examples of different classes of big data: 

(a) Commercial or transactional data arising from transactions between two entities, including credit card transactions, bank transactions, online transactions (including through mobile devices) and retailers’ sales records;

(b) Data from sensors, including from satellite imaging, road sensors and climate sensors;

(c) Data from tracking devices, including tracking data from mobile telephones and global positioning systems (GPS);

(d) Behavioural data, including online searches for a product, service or any other type of information and online page views;

(e) Opinion, such as comments on social media. 

10.7.        Although administrative data arising from the administration of a programme involving the collection of certain information, be it a governmental programme or not, are usually seen as a standard data source, they do have some of the characteristics associated with  big data sources, and could therefore also be included here. Those characteristics are high volume and rapid availability, as is the case with electronic records on medical procedures, hospital visits, insurance transactions, education programmes and value added tax records. 

10.8.        Complaints about official statistics usually include a lack of timeliness and high cost. Big data are often automatically generated and are accessible in real time. Therefore, it could certainly be envisioned that big data complement official statistics in order to improve timeliness and cut costs. 

10.9.        Many challenges need to be dealt with, however, to effectively use big data in official statistics. They include the following: 

(a) Legislative challenges with respect to access to and the use of data; 

(b) Privacy issues, including complying with confidentiality rules, gaining and maintaining public trust and achieving acceptance of data reuse and links to other sources; 

(c) Financial challenges regarding the potential continuous cost of acquiring, hosting and processing large data sources;

(d) Management issues regarding policies and directives about the rules, roles and regulations aimed at adequately protecting and securing sensitive data sources; 

(e) Methodological challenges, specifically with respect to representativeness of the data, the volatility of data sources over time and the need for adequate estimation and modelling techniques for making the data useful and in compliance with quality standards; 

(f) Technological issues related to hosting and accessing data sources, as well as data processing, system maintenance and the storage of huge amounts of data over time.


Next: B.3. Using big data for purposes of official statistics on the international supply of services