7.9. Integration of different data sources. To achieve full coverage of the international merchandise trade statistics, data compilers often have to merge and cross-check data collected from customs and non-customs sources, which is a highly complex and time-consuming activity. Merging customs and non-customs data includes adding non-customs data to the customs data and substituting non-customs data for the customs data. For the purpose of quality control and/or for the information of the users, compilers might wish to differentiate data based on customs data sources and data based on non-customs data sources.[1]
7.10. Issues encountered when merging data from different sources. Compilers should need to be aware that the following issues need to be addressed when merging data from different sources:
(a) Different sources may provide different data elements or levels of detail, e.g: parcel and letter post records might not contain any commodity detail; cross-border surveys might provide data only at the higher HS levels (e.g., that of HS chapters); and commodities that are difficult to classify might be allocated to a few broad categories in non-customs sources, making it difficult to merge them with the more detailed customs data (see the example of Uganda’s Informal Cross Border Trade Survey below);
(b) Some transactions might be subject to simplified reporting requirements at customs;
(c) There may be conceptual differences between sources: e.g., enterprise records might contain the country of purchase and sale but not the country of origin or last known destination;
(d) There may be delays in data forwarding by some source agencies or these agencies may use different release calendars, which may lead to unsynchronized provision of data;
(e) There may be a risk of double counting due to overlaps in the information provided by different sources: e.g., between data on goods on consignment supplied by customs, and data on sales of the same goods reported by the controlling governmental agency;
(f) It may be difficult to organize data processing in an efficient manner, since source agencies may use different data submission media (hard copies, portable storage, electronic transmission, e-mail, etc.) or incompatible computer data files (the integration of different hardware and software systems is a problem in numerous cases);
(g) Data entry from certain sources (e.g., postal forms, passenger manifests) may involve the use of a disproportionate amount of time and resources;
(h) There is a need to cross-check data from complementary sources (e.g., customs and commodity boards) and to determine which sets are of greater reliability;
(i) Survey results that apply to a period longer than the reference period used for the compilation of trade statistics cannot be easily added to the customs data;
(j) It is not always possible to identify partner countries in detail and some rest categories will need to be used at times;
(k) The statistical value is made up of several components, some of which may not be available in some cases;
(l) In enterprise surveys, quantity information is frequently not collected, or cannot be provided at a level of sufficient detail.