Title of Example

  AQ Data Quality Requirements, Bristol case

Example

   

Introduction

In order to maintain the monitoring network in Bristol there are site visits every two weeks to conduct a calibration of the continuous analysers using zero air generators and calibration span gases. Excel spreadsheets have been developed to record the instrument test measurements and also the concentrations when zero air and calibration gas is passed through the analyser. A laptop computer is used at each site to record the data directly to the spreadsheets. The spreadsheet archives the data so that it is easy to look at previous calibration data and see how the analyser is performing over time. It also calculates the offsets and multipliers necessary to re-scale the ambient data collected until the next calibration.

The Airviro system has been used to collect data from the continuous analysers. This is gradually being replaced by the Opsis system. A certain amount of automation has been developed to speed up the data scaling process. All the site specific spreadsheets are liked to a master tables spreadsheet. This sheet creates an ascii table which is used by specially created scripts in the Airviro system to take raw data from the sites, rescale them using the offsets and multipliers and creates a new scaled data set which is used for the data ratification process. This means there is always the raw data available if mistakes are made with the scaling process and have to be repeated.

Data Validation & Ratification Procedures

The objective of data validation/ratification as a data management process is to ensure that the data is consistent, reliable, credible and fit for purpose.

These procedures have been composed as standardised guidance only, and should not be taken as a definitive methodology for the validation and ratification of continuous analyser data. Effectively it is the experience of the operator, coupled with detailed knowledge of the operational status and performance of each analyser in the network that allows an acceptance or rejection of data as valid.

These guidance notes have been taken (in brief) from the following sources: -

Local Air Quality Management, Technical Guidance LAQM. TG (03), DEFRA Publications, Crown copyright 2003.

Automatic Urban Monitoring Network, Site Operator’s Manual, NETCEN, # 3.097, October 1998.

QA/QC Data Ratification Report for the Automatic Urban Network, DEFRA. (Quarterly Publications)

A Summary of the Ratification Process, Netcen, 2003

1) Initial Data Validation

This process involves the daily viewing and rapid screening of the raw data and identification of possible faults in the monitoring network. It may be seen, therefore, as the initial stage in the ratification process.

In order to detect rapidly ‘unusual’ data and faulty analysers, therefore maintaining high data capture rates, the raw data sets for each analyser must be viewed at regular and frequent intervals. It is recommended that ‘screening’ occur at least once daily. Following this, any suspicious data identified should be noted or flagged for further investigation as part of the full ratification process.

It is preferable for raw data to be scaled prior to initial validation as this will mean that appropriate offsets and multipliers have been applied. In practice, simple validation/screening can be conducted prior to the data scaling taking place. The whole purpose of the rapid screening of the data is to ensure that any possible faults are noted to enable a rapid response to possible system faults. Any scaling of data that occurs after the screening process will enable the full validation to take place at a later time period. OPSIS is configured so that all scaling and data manipulation is conducted on a duplicated data set (ASCII format), thereby leaving the original data set as received.

The following listing highlights some of the ‘anomalies’ that may occur in the raw data stream. The experienced operator will be able to distinguish between most of the various types of data anomaly itemised below.

Large data spikes

Possibly one of the most common ‘anomalies’ found in the raw data stream, the causes could be many and varied, including machine faults or acute localised events.

Machine faults

These may include: -

Internal zero/span enabled during daytime. This type of fault will occur at or about the same time every day.

Calibration spikes, where the analyser is not taken out of service prior to the calibration. These can be easily identified from calibration records and also by the magnitude of the peak.

Acute localised events

These may include: -

A car or heavy transport idling near to the analyser.

A local bonfire.

Emissions from industry (local or remote).

Episodes of unusually high/low values.

As with the above, there may be machine problems or ‘natural’ reasons why the data has unusually high or low values. Comparisons with other nearby sites may offer supporting evidence as to possible causes for the unusual data. The OPSIS software has page layouts designed to enable comparisons between nearby sites.

Some episodes of unusually high concentrations can be easily identified as probably genuine or spurious by comparison of data with other sites either in the national networks or locally operated. Examples of these are:

High concentrations of ozone at one site only.

-

Probably spurious.

High concentrations of ozone at more than one site.

-

Probably genuine.

Elevated concentrations of SO2 at a number of sites simultaneously with either no known local sources or local source near only one site.

-

Probably genuine, long range transport episode (eg power station emissions). This is especially likely if concentrations of PM10 are also elevated.

Elevated concentrations of SO2 at one of site with known local source(s).

-

Probably genuine.

Elevated concentrations of SO2 at one of site with no known local source(s).

-

Probably spurious but could be genuine.

[NO] or [NO2] greater than [NOX]

-

Possible wrongly connected outputs or mis-assignment of channels, otherwise instrument malfunction. One (simple) possibility is broken chopper belt or failed chopper motor.

Use of data files

The data files generated directly from the loggers contain all data including data which have been flagged as bad/out of service. Data which have been processed by the OPSIS software as ASCII files do not contain these data so these files should be used rather than the logger files.

Zero truncation

This type of fault is apparent by the way in which the data is cut off at the zero baseline of the graph. This is due to the analyser or data logger not being able to record negative values, old Environnement CO analysers may exhibit this type of data anomaly. This may be rectified by applying an offset value to the analyser.

Missing data

Data that is missing or lost during the data collection process may have several causes depending on the type of equipment used in the monitoring and transmission process.

It is evident that during the transmission of data using GSM modems there may be some interruption or transposition of the data leading to corruption and loss. With the later API analysers the large onboard memory allows for retrieval of data from several days to several months. If small ‘chunks’ of data are found to be missing then it is relatively easy to set retrieval from the source to an earlier time period, prior to that of the missing data.

Repetitious (identical) data.

As with the previous ‘missing data’ section, the causes of repetitions in the raw data stream can be as a result of the transmission of data through GSM phones. The OPSIS software itself has algorithms that will replace missing data with the last valid analyser measurement. This being a recognised and authenticated method of in filling gaps in the data stream. Another possible cause on NOX and SO2 analysers is a broken chopper belt or failed chopper motor.

It is recommended that care is used when isolating and rejecting these repetitious sequences, as there may be valid reasons why the data has long time periods of the same values. These include: -

extended time periods of little/no change in the pollutants being measured.

meteorological conditions

analyser off-line.

analyser fault.

Consideration also needs to be given to the pollutant and to the location of the monitoring equipment. Concentrations of SO2 tend to be very low except in the vicinity of major sources (industrial processes, large combustion plant or railway locomotives). As a consequence of this there is no immediately obvious reason to suspect long sequences of 1 or 2 ppb concentrations of SO2. Similarly at background locations concentrations of CO will usually vary by only small amounts whereas at roadside locations larger variations are normal.

In the case of traffic related pollutants variations are usually (but not always) greater during daylight hours than during the night and also on weekdays when compared to weekends. As a result of this a sequence of 5 or 6 hours of 0 or 1 ppb of nitric oxide between midnight and early morning is not necessarily indicative of a problem at a background site although it may indicate a problem at a roadside site.

In contrast to these situations extended periods of repeated higher concentrations should be regarded as dubious at best and more probably as spurious. Possible causes of this include instrument malfunction (broken chopper belt or chopper motor are possible causes) or leakage of span gas. The latter is only possible where an analyser is fitted with internal zero and span system or where calibration cylinders are stored on site.

[NO2]:[NOX] concentration ratios.

The ratio of concentrations of NO2 to NOX can provide clues to instruments malfunctioning or to unusual conditions. Typically at an extreme kerbside site this ratio will be low, in the region of 0.25 to 0.30, and at less extreme roadside sites it will be higher, typically 0.35 to 0.45. At an Urban Background/Urban Centre site it will usually be in the range 0.55 to 0.70. At a Suburban site it will usually be higher again, 0.75, and at Rural sites about 0.80. The highest ratios are observed at Remote Rural sites. The concentration ratios vary throughout the day at all sites. The greatest variations are at urban sites with the ratios being higher than average during the night and lower than average during the day. This is also the case at rural sites but to a lesser degree.

An abnormally high or low [NO2]:[NOX] ratio does not necessarily indicate instrument malfunction as extreme meteorological conditions cause this. The most obvious example is during a prolonged period of cold weather associated with an inversion layer forming where the [NO2]:[NOX] ratio will be lower than normal in spite of high concentrations of NO2.

2) Data Ratification

The previous section has been primarily concerned with the ‘day to day’ analysis of data. The ratification process is essentially related to the longer-term assessment of data trends and analyser performance over time periods of three, six or twelve months.

This is to ensure that any long-term drift in analyser response to zero and span checks becomes evident; where in the short term it would not, therefore, allowing drift adjustments to be made. Further to this, any adjustments made to the monitoring equipment will effectively alter its performance characteristics.

It is imperative that detailed records are kept of all equipment associated with or used within the monitoring network. All relevant data and records of servicing, repairs and analyser performance are subsequently compiled and compared with the results for each site. This process assigns missing or spurious data to specific analyser faults or analyser performance over the ratification time period.

Effectively, using the full ratification process, a complete history of the individual site operations is ratified (audited) and the data resulting from that site is therefore of a known quality. It represents the final stage of data acceptance prior to its use.

Procedure (preliminary listing)

Data scaling

Examine calibration data for analyser drift and performance.

The calibration data must be inspected for excessive analyser (zero/span) drift prior to it being applied to the raw data. Within the AURN data validation procedures excessive drift is given as > 5% over the previous results. The data storage (Excel) software should give indication of zero and span results outside of this range and provide instant recognition of this situation. Further to this, the quality of the analyser data is based on the machine functioning correctly within its design limits and operational parameters. The fortnightly site visits are at present the only way of recording this information on analyser performance. It is vital that all of the data relating to analyser performance and quality obtained from these visits is inspected and approved prior to use. The Excel software should also be capable of distinguishing when the design and operational parameters are exceeded.

Apply fortnightly calibration results.

The calibration results should be applied to one channel (ASCII data sets in OPSIS) of the raw data set as soon as received and audited for analyser performance.

Note. At present the application of calibration data to scale raw data can be conducted within OPSIS but the procedure is rather long winded for the amount of sites. The OPSIS software developers are conducting development of enhanced data scaling of raw data. In-house automated data scaling is also being researched at present using Excel software to scale raw data from the OPSIS database.

Note all site characteristics and analyser performance.

Detailed records of analyser performance and site characteristics should be noted and entered onto the database for each site. All changes to buildings and infrastructure within the vicinity of the site, including changes to road layout and local construction work etc. should also be included.

Data validation

Daily checks on raw analyser data

Note all anomalous data spikes, excursions and trends

Compare with other nearby network sites

Compare local meteorology to data

Data ratification

View data in time series over ratification period

Compare all site and service records to scaled data

Compare with other sites and levels of other pollutants

Examine calibration drift records

Completion

When all of the above methodologies have been conducted the data should be fit for the purpose of Bristol City Council’s Air Quality Assessment. The systems in use at present should produce results of good accuracy and precision, it is considered that +/- 15% accuracy is achievable through a dedicated approach to consistency.

Glossary.

Offset.

-

The difference between the observed concentration when running zero gas and zero.

Multiplier.

-

The ratio of actual concentration of span gas to observed concentration. ([Span gas (actual)]/([Span gas (observed)] – [Zero gas (observed)])).

Validation (screening).

-

Initial identification and removal of obvious spurious data or flagging of possible dubious data.

Ratification.

-

Final scaling of data and removal of dubious data where these are positively identified as spurious.

Last Updated


 

13th January 2005

Back