- 1Department of Environmental Sciences and Engineering, NOVA School of Sciences and Technology, NOVA University Lisbon, Lisbon, Portugal
- 2Center for Environmental and Sustainability Research, NOVA School of Sciences and Technology, NOVA University Lisbon, Lisbon, Portugal
Air pollution is a major concern issue for most countries in the world. In Portugal and Macao, the values of nitrogen dioxide (NO2), particulate matter (PM) and ozone (O3) are frequently above the concentration thresholds accepted as “good air quality.” Portugal follows the European Union (EU) legislation (Directive 2008/50/EC) on air quality and Macao the air quality guidelines (AQG) from the WHO. Air quality forecasts are very important mitigation tools because of their ability to anticipate pollution events, and issue early warnings, allowing to take preventive measures and reduce impacts, by avoiding exposure. The work presented here refers to the statistical forecast of air pollutants for three regions: Greater Lisbon Area, Madeira Autonomous Region (both located in Portugal), and Macao Special Administrative Region (in Southern China). The presented statistical approach combines Classification and Regression Tree (CART) and multiple regression (MR) analysis to obtain optimized regression models. This consolidated methodology is now in operation for more than a decade in Portugal, and is subject to regular updates that reflect the ongoing research and the changes in the air quality monitoring network. Recently, the same methodology was applied to Macao in collaboration with the Macao Meteorological and Geophysical Bureau (SMG). Here, a statistical approach for air quality forecasting is described that has been proven to be successful, being able to forecast PM10, PM2.5, NO2, and O3 concentrations, for the next day, with a good performance. In general, all the models have shown a good agreement between the observed and forecasted concentrations (with R2 from 0.50 to 0.89), and were able to follow the concentration evolution trend. For some cases, there is a slight delay in the prediction trend. Moreover, the results obtained for pollution episodes have proven that statistical forecast can be an effective way of protecting public health.
Introduction
The Ambient Air Quality Directives of European Union (EU) set standards for key air pollutants. These values take into account the 2005 WHO guidelines and considerations of technical and economic feasibility at the time of their adoption.
Air quality forecasting, if reliable and sufficiently accurate, can play an important role as part of an air quality management system (NOAA, 2001). Its applications can fall into several broad areas, such as health alerts—many cities currently provide warnings to the public when air pollution levels exceed specified levels, being those warnings directed at specific populations that are particularly sensitive to air pollution (e.g., asthmatics) (Liu et al., 2018); in addition, air quality forecasts can supplement existing emission control programs or emergency responses, with cities offering free access to public transportation (Quarmby et al., 2019); on pollution episode days, to reduce vehicle emissions, and regions implementing the “No-Burn day” (AQMD, 2022); consisting of a ban period on wood-burning in residential fireplaces, stoves, or outdoor fire pits, when particulate matter concentrations are expected to reach unhealthy levels, due to air emissions and stagnant weather conditions.
To predict the next-day daily average concentrations of particulate matter (PM10 and PM2.5), daily hourly maximum concentrations of ozone (O3), and daily hourly maximum concentrations of nitrogen dioxide (NO2), at air quality monitoring stations locations, forecast models were developed based on statistical methods using multiple linear regression (MR) and Classification and Regression Tree (CART) analysis. The NOVA University Lisbon (NOVA School of Science and Technology), in collaboration with the Portuguese Environment Agency (APA) and the Portuguese Institute for Sea and Atmosphere (IPMA), runs and disseminates daily air quality forecasts based on a statistical approach, first used by Cassmassi (1987) at South Coast Air Quality Management District California, USA. This statistical methodology is now in operation, in Portugal (Neto et al., 2005), for more than a decade and is the subject of regular updates, reflecting the ongoing research, and the changes in the air quality monitoring network. Recently the same methodology was extended to Madeira Autonomous Region, in Portugal, and was also applied to Macao Special Administrative Region of the People's Republic of China (MSAR), resulting from a collaboration with the Macao Meteorological and Geophysical Bureau (SMG) (Lei et al., 2019, 2020).
Air pollution is a major concern issue for most countries in the world. The global burden of disease associated with air pollution exposure exacts massive toll on human health worldwide: it is estimated to cause millions of deaths and lost years of healthy life annually. The burden of disease attributable to air pollution is now estimated to be on a par with other major global health risks, such as unhealthy diet and tobacco smoking, and air pollution is now recognized as the single biggest environmental threat to human health (World Health Organization, 2021). In Portugal, despite improvements in the past two decades, there are still exceedances mainly to nitrogen dioxide (NO2) annual limit value, to particulate matter (PM10) daily limit value, and to ozone (O3) target value. In Macao, concentrations of these pollutants are frequently above the thresholds accepted as “good air quality.”
High concentrations of NO2, PM, and O3 in the low troposphere are an additional risk factor for cardiovascular and respiratory diseases and contribute to mortality all over the world (Sheng and Tang, 2013; Lee et al., 2017). Surface O3 is known by its negative impacts in the respiratory system leading to more hospitalizations (Entwistle et al., 2019). PM, in particular smaller fractions such as PM2.5, are a major concern once they can get deep into lungs and some may even get into the bloodstream (Wiśniewska et al., 2019). Finally, specific combinations of concentration levels of these pollutants may be more dangerous than equally high levels of all the pollutants (Sheng and Tang, 2013). Portugal follows the European Union (EU) legislation (Directive 2008/50/EC) (European Union Legislation, 2008) on air quality, and Macao follows the Chinese National Ambient Air Quality Standards (NAAQS), which in turn are based in the Interim target-1 Air Quality Guidelines from the WHO (WHO, 2006). EU limit values, legally binding, were set for human health protection for, among other pollutants and averaging periods, PM10 as 50 μg/m3 (at a daily basis), PM2.5 as 25 μg/m3 (at annual basis), NO2 as 40 μg/m3 (at annual basis), and the O3 target value is of 120 μg/m3 (referring to the maximum daily 8-h mean, represented as O3MAX). The NAAQS has set the threshold of PM10 at 150 μg/m3 (daily basis), PM2.5 at 75 μg/m3 (daily basis), NO2 at 40 μg/m3 (annual basis), and O3MAX at 160 μg/m3 (daily maximum 8-h mean). Some of these values contrast with the recently recommended WHO Air Quality Guideline levels, updated in 2021, after a systematic review of accumulated evidence, which are set at 45 μg/m3 for PM10 (daily basis), 15 μg/m3 (daily basis) and 5 μg/m3 (annual basis) for PM2.5, 10 μg/m3 for NO2 (annual basis), and 100 μg/m3 for O3 (daily maximum 8-h mean) (World Health Organization, 2021).
High-density and high-rise cities have become increasingly common in Asia (Lee et al., 2017). Air quality is a significant public health risk in many of these cities (World Health Organization, 2016). Macao, a coastal city located in southern China, is one example of a high-density, high-rise city with air quality issues. Macao was listed as the number one most densely populated region in the world (Sheng and Tang, 2013), with a population density of about 20,000 inhabitants/km2, accounting for a population of 680,000 within an area of 32.9 km2. The clustering effect is further enhanced by the prevalence of high-rise buildings. European cities likely have lower pollution and building densities, and fewer small-scale dispersed pollution sources, than high-density high-rise cities.
Factors leading to variation in pollution levels are diverse, and include both human activities and meteorological factors (Boubel et al., 1994). Meteorology plays a fundamental role in the re-distribution of air pollutants after their release in the atmosphere (Boubel et al., 1994). The characterization of local and large scale circulation winds and vertical atmospheric stability, allows accounting for transport, mixture, and dispersion processes (Boubel et al., 1994). Precipitation refers to the natural processes by which material is removed by atmospheric hydrometeors (cloud and fog drops, rain, and snow) and delivered to the Earth's surface (Seinfeld and Pandis, 2006).
The association between specific meteorological parameters and air quality can be quantified using a variety of statistical techniques. Statistical forecast methods analyze the events without knowing the mechanism of the change; therefore, this method is not dependent on physical, chemical, or biological processes (Bai et al., 2018). Instead, methods, such as regression analysis, investigate relationships between variables. Forecasting is a requisite part in the science of big data, and can be used to infer the future development of an object relative to previous information (Bai et al., 2018). Pollution forecasting can be understood as an estimation of a pollutant concentration at a specified future date.
This work aimes to provide an overall description of the current statistical methods, used by NOVA University Lisbon air quality group, to forecast air pollutant concentrations. Some of the discussed aspects are related with data requirements, steps involved in the model development, advantages and disadvantages of this approach. Model performance indicators are presented for each region and pollutant. Finally, examples of model performance are presented, for pollution episodes occurred in 2019 over the three studied regions. In this context, air quality forecast models are relevant tools because of their ability to anticipate and follow pollution episodes, allowing to support decisions, such as early warnings to the population, which can take preventive measures and avoid exposure, and reducing negative health impacts.
Methods
Nowadays, forecasting, by statistical methods, has a wide range of applications and is used all over the world, based on the application of a multitude of algorithms. These methods are very accurate and enable a better understanding of the relationships between air quality data behavior and the underlying meteorology (Bai et al., 2018). In the present work, statistical models were developed based on the techniques of MR and CART. Both techniques rely on the historical data of meteorological and air quality variables.
Regression analysis methods are based on the association between pollutant levels, and meteorological and aerometric variables, which can be quantified by analyzing historical datasets, using standard statistical analysis packages, as shown on previous works (Cassmassi, 1987; US EPA, 2003; Choi et al., 2013; Durão et al., 2016; Oduro et al., 2016). The resultant multivariant linear regression equation is then used to forecast future pollution levels. The CART technique identifies those variables (meteorological or air quality) that are most strongly correlated with ambient pollution levels. These variables are then used to predict next day pollution levels, either daily average or maximum hourly concentrations depending on the pollutant, based on same day air quality levels and next day forecasted meteorology.
The referred statistical models were applied to forecast the average daily concentrations of PM10 and PM2.5, and the hourly maximum concentrations of O3 and NO2 (referred as O3MAX and NO2MAX, respectively), for the next day, for each air quality monitoring station (AQMS) location. For the work presented here, a set of AQMS were selected in the Greater Lisbon Area (4 AQMS), Madeira Autonomous Region (2 AQMS), and Macao (4 AQMS), represented in Figure 1.
Figure 1. (A) Madeira island air quality observation network (modeled subset); (B) Macao air quality observation network (source: https://www.smg.gov.mo/en/subpage/182/page/123); and (C) Lisbon air quality observation network.
The Greater Lisbon Area includes the Portuguese capital, Lisbon, being the main economical sub region of the country. It covers 1,376 km2 and it is the most densely populated Portuguese sub region (with 2,042,477 inhabitants and 1,484 inhabitants/km2). The Greater Lisbon Area presents a mild subtropical or a hot-summer Mediterranean climate (Csa), according to Köppen's classification (Köppen, 1936), based on temperature and precipitation mean values. Lisbon summer is mild to hot with significant temperature variations related, in part, to coastal distance. The wind blows most frequently from the north quadrant, according to the 1971–2000 climatological normals (IPMA, 2022).
Madeira, an autonomous region of Portugal, is an archipelago comprising four islands off the northwest coast of Africa. This region comprises a population of 253,945 inhabitants within an area of 801 km2. The capital of Madeira is Funchal that has a subtropical Mediterranean climate (Csa), according to Koppen's classification (Köppen, 1936). Funchal's climate is predominantly determined by the Atlantic Ocean and as a result weather extremes are rare. Temperature usually rises significantly when the influence of the persistent eastern-winds from northern Africa is felt. The annual average maximum temperature is 22.1°C and the average of minimum temperature is 15.8°C (IPMA, 2022).
Located along the southeast coast of Mainland China, Macao is surrounded by water on three sides, with a subtropical oceanic monsoon climate that is characterized by high temperatures, high levels of atmospheric moisture and abundant rainfall (SMG, 2022). In winter, Macao is cold and dry with predominant northern winds, and the summer is presented with heavy rains due to the strong southwest monsoon. Spring and autumn are transition periods. The winter northeast monsoon is known to have the ability to transport pollutants from northern and eastern China (Tong et al., 2018). In summer season, from June to August, rainfall increases, providing a better atmospheric mixing, and persistent southern winds occur, resulting in PM levels to decrease (Lopes et al., 2016).
Data from 3- to 6-year daily series observations, were used to develop the forecast models, and each of the models was evaluated using 2019 data. The time period used to build each model equation is different for each region and AQMS, according to data availability (Table 1). The selection of a representative modeling period is important, being recommended at least 2 years of data.
Table 1. Modeling and validation periods considered by region and air pollutants forecasted at each air quality monitoring station (AQMS).
Regarding the data collection phase, a large set of meteorological and air quality data was gathered, namely: (i) meteorological surface observations: hourly observations from automatic weather stations, such as temperature, relative humidity, and dew point temperature; (ii) upper-air observations, such as, geopotential heights, temperature, relative humidity, and dew point temperature at various altitudes; and (iii) surface air quality measurements, from AQMS network, of PM10, PM2.5, NO2, and O3. Other variables were added to the analysis, as the flag for week/weekend day and the daily sunlight period duration. A list of independent variables and data sources used as potential predictors in the modeling phase is presented on Table 2. The model development flowchart is represented in Figure 2.
Series with a low annual efficiency (<75%) on data availability were rejected. For operational purposes, limitations related to the expected daily data availability, to perform the forecast, were also considered. Outliers were identified and excluded from the data series.
In the process of the MR model development, several of the initial variables were removed of the final models due to high correlation values among variables, or due to the commitment to obtain the simplest model, with the smallest number of variables that maximizes the explained variance. In addition, fewer variables mean that, in operational mode, missing data have a lower impact on the quality of the forecast. In MR analysis, one has to seek for a compromise between model improvement, obtained by adding variables, and the increase of complexity and uncertainty introduced by a new variable.
Another important aspect is the effort to achieve accurate forecast results when higher concentrations are predicted, since they are related with higher negative health impacts. One of the advantages of the CART technique is to be able to establish particular model equations that can accommodate specific trends caused by meteorological circumstances that trigger high level concentrations (Choi et al., 2013). This is important to send advisory recommendations in the anticipation of higher pollution episodes to avoid excessive exposure.
The CART analysis defines a path with several nodes, where threshold values on specific variables split into binary ramifications, based on the largest reduction in variations in the target variable in each of the new branch (Choi et al., 2013). CART analysis produces a tree representation, as exemplified in Figure 3 for PM10, according to some parameterizations, as the pretended tree depth. The CART analysis and the MR model development were performed using the IBM software SPSS (Version 25).
Model performance evaluation was accomplished by computing the most common scores: (i) bias (Equation 1), (ii) mean absolute error (MAE) (Equation 2), (iii) root mean square error (RMSE) (Equation 3), (iv) coefficient of determination (R2) (Equation 4), and (v) relative mean absolute error (RMAE) (Equation 5), where f represents the forecasting value, o the observed value, n the forecast/observation pairs, f the forecasting mean value, and o the observed mean value. These statistic measures of agreement were obtained by comparing the forecasted 2019 validation data set to the observed/monitored air pollutant levels on that year.
The obtained statistical models allow to perform a daily forecast, for the next day, of PM10, PM2.5, NO2, and O3 concentrations, in an operational mode, for the three studied regions. The prediction models run daily, after 16 h for Macao and 13 h for both Greater Lisbon Area and Madeira Autonomous Region, according to the daily schedules in which air quality data is made available.
In the final stage of the operational process, a forecasted air quality index (AQI) is produced for each pollutant for the next day based on the daily air pollutant concentrations, mean or maximum daily values, depending on the pollutant. The final AQI, for each location, corresponds to the worst level of air quality among the forecasted pollutants.
Results
Model Selected Variables and Performance Indicators
The statistical models based on MR and CART analysis were developed to forecast NO2, PM10, PM2.5, and O3 concentrations. The objective is to perform a daily forecast, for the next day, in an operational mode by running the prediction models after 16 h for Macao and 13 h both Greater Lisbon Area and Madeira Autonomous Region.
CART analysis was tested mainly to better predict high concentration levels. For Macao region, CART analysis did not improve the quality of overall predictions being, in this case, the prediction models based only on one MR equation. We believe that in Macao, pollution is frequently due to distant sources with pollutants being, transported through the advection of air masses by large scale circulation. Therefore, local meteorology is not as critical, being one equation sufficient to explain and enable the prediction of next day pollutant concentrations, for each monitoring location. The exception was verified for O3MAX predictions, at two AQMS. In these last cases, CART analysis allowed to identify split nodes, for which O3 prediction equations were determined, afterward, by using MR for each node. Opposing to Macao trend, in Greater Lisbon Area, almost every AQMS and pollutant are being forecasted with CART and MR.
The most prevalent variable, being selected at all the forecast equations, is the one that represents the last 24-h pollutant concentrations (16 h from yesterday to 15 h today in the case of Macao, and 11 h of yesterday to 12 h of today for the Great Lisbon Area and Madeira Autonomous Region). Regarding the meteorological selected independent variables used as predictors (Table 3), the geopotential height at 850 hPa (H_850), indicator of synoptic-scale weather pattern, is frequently present in the forecast of NO2 and PM, both in Lisbon and Macao. For Lisbon and Madeira stations, the most common and frequent weather variable is air temperature at 925 hPa (TAIR_925), a measure of the strength and height of the subsidence inversion, especially for PM forecasts. In Macao, in addition to H_850 (the most common variable), figures the RHMEA, attributing relevance to relative humidity in the air quality forecast at this region. In Lisbon, the final set of model selected variables covers 13 weather variables. In Madeira and Macao, there is a lower variability of different weather variables: for Madeira TAIR_925 and the average dew point temperature (DEWPMEA) are the most common with more four different variables, and in Macao H_850 and RHMEA are the most common with more seven different variables. In Table 4, the obtained MR model equations are presented for one AQMS selected for each studied region.
Table 4. Model equations obtained for Greater Lisbon Area (Entrecampos), Madeira Autonomous Region (São João), and Macao Administrative Region (Taipa Ambient).
Models were validated with collected data from 2019. For validation purposes, it is important to use at least 1 year of data, to accommodate for all the seasonal variations. Model performance indicators are summarized, by region, in Table 5 (Greater Lisbon Area), Table 6 (Madeira Autonomous Region) and Table 7 (Macao Administrative Region). The referred tables contain the obtained model performance indicators, such as, R2, RMSE, MAE, Bias, and RMAE. For each station and pollutant, the forecasted time series was plotted against observations (Figures 4–6).
Table 5. Model performance indicators for validation with 2019 data, by AQMS and pollutant, at Greater Lisbon Area.
Table 6. Model performance indicators for validation with 2019 data, by AQMS and pollutant, at Madeira Autonomous Region.
Table 7. Model performance indicators for validation with 2019 data, by AQMS and pollutant, at Macao Administrative Region.
Figure 4. Daily observations (OBS) and forecasts (FCST) at three monitoring stations (AVL, Avenida da Liberdade; MEM, Mem Martins; ENT, Entrecampos) in Greater Lisbon Area, for 2019.
Figure 5. Daily observations (OBS) and forecasts (FCST) at three monitoring stations (Macao Residential, Taipa Residential, and Taipa Ambient) in Macao Administrative Region, for 2019.
Figure 6. Daily observations (OBS) and forecasts (FCST) at two monitoring stations (SJO, São João; SGO, São Gonçalo) in Madeira Autonomous Region, for 2019.
The results show a good agreement between modeled and observed concentrations, being statistically significant at the 95% CI. The selected models provide a good relationship between meteorological and air quality variables, when performing an air quality forecast under different situations.
The time series plotting displays a good overall correlation between observations and forecasted values, however, there is a slight trend to underestimate the maximum peaks. The statistical scores are comparable across the regions under analysis.
Regarding the obtained R2 for modeled vs. observed concentrations (Tables 5–7), the following aspects can be highlighted:
• the R2 is, on average, lower for Lisbon than for Madeira and Macao, ranging from 0.5 in Olivais for PM2.5 to 0.81 in Mem Martins for PM10;
• Macao presents R2-values ranging from 0.78 at Taipa Residential for O3 to 0.89 at Macao Roadside and Macao Residential (for NO2 and PM10, respectively);
• Madeira R2-values range from 0.67 at São Gonçalo for O3 to 0.85 at São João for PM2.5.
In general, the bias stays very close to zero with the maximum value being 3.6 achieved for the NO2 at the station Olivais (Lisbon). When comparing the BIAS and MAE, there are significant differences between pollutants, some of them related to different ranges of variation of the daily concentrations. The RMAE ranges from 8.7% for PM10 at Taipa Residential (Macao) to 34.3% for PM2.5, at Olivais. Comparing the RMAE for the different regions, Lisbon displays, on average, higher values than Macao and Madeira, being PM2.5 the pollutant with the lowest performance.
Atmospheric Pollution Episodes
As examples of the response of developed models, in situations where air pollutant concentrations rise significantly, a few pollution episodes were chosen for each region under study, considering different pollutants: PM10 and O3.
Long-range transport processes of desert dust from North Africa are not infrequent, significantly affecting ground level particle concentrations recorded during these events, in Iberian Peninsula (Querol et al., 2009). In Portugal, both in Lisbon and Madeira, these natural dust intrusion episodes are common, contributing to higher PM10 concentrations, frequently above the daily limit value of 50 μg/m3, as represented in Figures 7, 8, often due to the persistence of specific synoptic patterns. Both in Lisbon and Madeira case studies, forecast models slightly underestimated PM10 concentrations, but were able to follow general PM10 evolution profile, showing a small delay in the prediction trend.
Figure 7. Particulate matter (PM10) observed (OBS) and forecasted (FCST) concentrations, with emphasis on the natural dust episode occurred in 2019 (20-25/02/2019), at four air quality monitoring stations at Greater Lisbon Area.
Figure 8. Particulate matter (PM10) observed (OBS) and forecasted (FCST) concentrations, with emphasis on the natural dust episode occurred in 2019 (22-26/02/2019), at São João air quality monitoring station at Madeira Autonomous Region.
Concerning Macao Administrative Region, a period covering the Chinese National Holiday was chosen, in which a rise of PM10 concentrations, to values over 120 μg/m3, was measured at four air quality monitoring stations (Figure 9). This holiday is known to be a golden week of tourism, Macao being one of the favorite destinations for Chinese tourists (Lee et al., 2017) and also characterized by the release of a considerable amount of fireworks. The PM10 peak concentrations, occurred on the 1st of October, was well predicted for Taipa monitoring locations and slightly underestimated for Macao stations.
Figure 9. Particulate matter (PM10) observed (OBS) and forecasted (FCST) concentrations, with emphasis on the Chinese National Holiday in 2019 (01/10/2019), at four monitoring stations at Macao Administrative Region.
Regarding O3, a set of pollution episodes, occurred in 2019 in the three case study regions, is presented in Figures 10–12. The marked high pollution intervals in these figures correspond to the exceedance of pollutant specific legal thresholds. Mechanisms for near-surface ozone formation and depletion are complex. Previous studies have shown that ozone production accelerates at high temperatures, which may be attributed not only to the temperature dependence of chemical reactions, but also to the weak winds which accompany high temperatures and heatwaves, and cause the atmosphere to stagnate and built up ozone levels (Pyrgou et al., 2018). In a general mode, all the models have shown a good agreement between the observed and forecasted concentrations and were able to forecast the pollution peaks with a good degree of precision. However, in the case of Madeira, due to the particular circumstances of being an island with extreme altitude variations, a meteorological next-day variable was not found to be integrated in the model and anticipate some higher ozone levels. Ozone, as secondary pollutant, has a complex formation process that creates higher forecast difficulties in certain geographical areas. Therefore, in some of these situations, the lag shown in Figure 11, between observed and predicted ozone concentrations, is mostly a consequence of the daily evolution trend from the day before.
Figure 10. Ozone (O3) observed (OBS) and forecasted (FCST) concentrations, with emphasis on the pollution episodes occurred in 2019 (04-08/09/2019 and 12-15/09/2019), at three air quality monitoring stations at Greater Lisbon Area.
Figure 11. Ozone (O3) observed (OBS) and forecasted (FCST) concentrations, with emphasis on the pollution episode occurred in 2019 (01-06/05/2019 and 13-19/05/2019), at São Gonçalo air quality monitoring station at Madeira Autonomous Region.
Figure 12. Ozone (O3) observed (OBS) and forecasted (FCST) concentrations, with emphasis on the pollution episode occurred in 2019 (18-19/10/2019), at Taipa Ambient air quality monitoring station at Macao Administrative Region.
Discussion
The described statistical approach to air quality forecasting has proven to be successful, being able to forecast next day NO2, PM10, PM2.5, and O3 concentrations with a good performance reflected by the presented evaluation scores. The results differ slightly, between stations and pollutant, but overall the variables included in each model explain more than 90% of the variance of the independent variable in the development stage, value that usually decreases in the validation period. The application to different regions pretends to demonstrate the versatility of the methodology. It is expected a small degradation in the performance of the models when in operation, due to several factors, such as the uncertainty of meteorological forecasts.
Statistical models should be updated on a regular basis if there are, for instance, significant changes on local sources of air pollution, but can also be improved with the introduction of new variables, as predictors, in order to better explain part of pollutant variance.
The forecast models can show a slight delay in response to the short-term sudden variations on concentrations, once the previous day concentration is itself the independent variable considered as the best predictor since it explains most of the model variance. However, as shown in the selection of PM10 and O3 pollution episodes, this methodology was able to reproduce the trend and variations of monitored air pollutant concentrations. This shows that the regression models obtained can be reliably applied to forecast next-day pollutants concentrations across different magnitude levels of air pollution, being a useful tool for air pollution impacts mitigation.
The method has a few advantages when compared with numerical modeling, namely the lower complexity of development and implementation, and the fewer computing resources needed. On the disadvantages side, it can be pointed at the high dependence on a good operating air quality monitoring network and meteorological forecasts. In the case of Portugal, where next day forecasts provided by the Portuguese Environmental Agency are calculated by two methods (deterministic and stochastic) as part of an ensemble approach for both PM10 and O3, the quality data for 2019 show that the probability of detection, by the stochastic model was higher for all the regions (6) within the country except for one.
Data Availability Statement
The presented datasets belong to public institutions. Requests to access these datasets should be directed to lc.mendes@fct.unl.pt.
Author Contributions
LM, JM, and FF: data curation and writing—review and editing. FF: funding acquisition. LM: methodology, software, and writing—original draft. JM and FF: supervision. LM and JM: validation. All authors have read and agreed to the published version of the manuscript.
Funding
This research is based on the outcomes from the Portuguese PrevQualar Project, supported by the Portuguese Environment Agency. Furthermore, the preparatory work performed for Macao Administrative Region was supported by the Macao's Meteorological and Geophysical Bureau (SMG). The research activities developed at CENSE are financed by the Portuguese Foundation for Science and Technology (FCT) through the Strategic Project UIDB/04085/2020. The presented work was made possible with the support by Portuguese Institute for Sea and Atmosphere (IPMA). Concerning the Macao modelling tasks, the work was supported by Dr. Man Tat-Lei.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
AQMD (2022). South Coast AQMD, https://www.aqmd.gov/home/programs/community/community-detail?title=check-before-you-burn (accessed November 22, 2021).
Bai, L., Wang, J., Ma, X., and Lu, H. (2018). Air pollution forecasts: an overview. Int. J. Environ. Res. Public Health 15, 780. doi: 10.3390/ijerph15040780
Boubel, R. W., Fox, D. L., Bruce Turner, D., and Stern, A. C. (1994). Fundamentals of Air Pollution, 3rd Edn. Cambridge, MA: Academic Press.
Cassmassi, J. C. (1987). Development of an objective ozone forecast model for the South Coast Air Basin. in Annual Meeting of the Air Pollution Control Association, Conference: 80, Vol. 4 (New York, NY),21–6.
Choi, W., Paulson, S. E., Casmassi, J., and Winer, A. M. (2013). Evaluating meteorological comparability in air quality studies: classification and regression trees for primary pollutants in California's South Coast Air Basin. Atmos Environ. 64, 150–159. doi: 10.1016/j.atmosenv.2012.09.049
Durão, R. M., Mendes, M. T., and Pereira, M. J. (2016). Forecasting O3 levels in industrial area surroundings up to 24 h in advance, combining classification trees and MLP models. Atmos. Pollut. Res. 7, 961–970 doi: 10.1016/j.apr.2016.05.008
Entwistle, M. R., Gharibi, H., Tavallali, P., Cisneros, R., Schweizer, D., Brown, P., et al. (2019). Ozone pollution and asthma emergency department visits in Fresno, CA, USA, during the warm season (June–September) of the years 2005 to 2015: A time-stratified case-crossover analysis. Air Qual. Atmos. Heal. 12, 661–672. doi: 10.1007/s11869-019-00685-w
European Union Legislation (2008). Directive 2008/50/EC. Available online at: https://eur-lex.europa.eu/legal-content/en/ALL/?uri=CELEX:32008L0050 (accessed November 20, 2021).
IPMA (2022). Portuguese Institute for Sea and Atmosphere. Available online at: https://www.ipma.pt/pt/oclima/normais.clima/1971-2000/normalclimate7100.jsp (accessed November 11, 2021).
Köppen, W. (1936). Das geogrphische system der climate. In: Handbuch der klimatologie, editors W. Köppen, R. Geiger, Grebrüder Borntraeger, Berlin, 1–44.
Lee, M., Brauer, M., Wong, P., Tang, R., Tsui, T. H., Choi, C., et al. (2017). Land use regression modelling of air pollution in high density high rise cities: a case study in Hong Kong. Sci. Total Environ. 592, 306–315, doi: 10.1016/j.scitotenv.2017.03.094
Lei, M. T., Monjardino, J., Mendes, L., and Ferreira, F. (2019). Macao air quality forecast using statistical methods. Air Qual. Atmos. Health 12, 1049–1057 doi: 10.1007/s11869-019-00721-9
Lei, M. T., Monjardino, J., Mendes, L., Gonçalves, D., and Ferreira, F. (2020). Statistical forecast of pollution episodes in macao during national holiday and COVID-19. Int. J. Environ. Res. Public Health 17, 5124. doi: 10.3390/ijerph17145124
Liu, H.-Y., Dunea, D., Iordache, S., and Pohoata, A. (2018). A review of airborne particulate matter effects on young children's respiratory symptoms and diseases. Atmosphere 9, 150. doi: 10.3390/atmos9040150
Lopes, D., Hoi, K. I., Mok, K. M., Miranda, A. I., Yuen, K. V., and Borrego, C. (2016). Air quality in the main cities of the pearl river delta region. Glob. Nest J. 18, 794–802. doi: 10.30955/gnj.002007
Neto, J., Torres, P., Ferreira, F., and Boavida, F. (2005). Lisbon air quality forecast using statistical methods. Int. J. Environ. Pollut. 39, e028695. doi: 10.1504/IJEP.2009.028695
NOAA (2001). National Oceanic and Atmospheric Administration. Available online at: https://csl.noaa.gov/aqrsd/reports/forecasting.pdf (accessed November 22, 2021).
Oduro, S. D., Ha, Q. P., and Duc, H. (2016). Vehicular emissions prediction with CART-BMARS hybrid models. Transp. Res. Part D Transp. Environ. 49, 188–202. doi: 10.1016/j.trd.2016.09.012
Pyrgou, A., Hadjinicolaou, P., and Santamouris, M. (2018). Enhanced near-surface ozone under heatwave conditions in a Mediterranean island. Sci. Rep. 8, 9191. doi: 10.1038/s41598-018-27590-z
Quarmby, S., Santos, G., and Mathias, M. (2019). Air quality strategies and technologies: a rapid review of the international evidence. Sustainability 11, 2757. doi: 10.3390/su11102757
Querol, X., Pey, J., Pandolfi, M., Alastuey, A., Cusack, M., Pérez, N., et al. (2009). African dust contributions to mean ambient PM 10 mass-levels across the Mediterranean Basin. Atmos. Environ. 43, 4266–4277. doi: 10.1016/j.atmosenv.2009.06.013
Seinfeld, J. H., and Pandis, S. N. (2006). Atmospheric Chemistry and Physics: From Air Pollution to Climate Change, 2nd Edn. New York, NY: John Wiley and Son, Inc.
Sheng, N., and Tang, U. W. (2013). Risk assessment of traffic-related air pollution in a world heritage city. Int. J. Environ. Sci. Technol. 10, 11–18. doi: 10.1007/s13762-012-0030-1
SMG (2022). Climate in Macao. Available online at: https://www.smg.gov.mo/en/subpage/124/page/28 (accessed November 10, 2021).
Tong, C. H. M., Yim, S. H. L., Rothenberg, D., Wang, C., Lin, C.-Y., Chen, Y. D., et al. (2018). Projecting the impacts of atmospheric conditions under climate change on air quality over the Pearl River Delta region. Atmos. Environ. 193, 79–87. doi: 10.1016/j.atmosenv.2018.08.053
US EPA. (2003). Guidelines for Developing an Air Quality (Ozone and PM2.5) Forecasting Program. New York, NY: US EPA.
WHO. (2006). Air Quality Guidelines: Global Update 2005: Particulate Matter, Ozone, Nitrogen Dioxide and Sulfur Dioxide. Geneva: WHO.
Wiśniewska, K., Lewandowska, A. U., and Staniszewska, M. (2019). Air quality at two stations (Gdynia and Rumia) located in the region of Gulf of Gdansk during periods of intensive smog in Poland. Air Qual. Atmos. Heal. 12, 879–890. doi: 10.1007/s11869-019-00708-6
World Health Organization (2016). Ambient (Outdoor) Air Pollution Database, By Country and City (Data file). Available online at: http://www.who.int/phe/health_topics/outdoorair/databases/cities/en/ (accessed November 20, 2021).
Keywords: particulate matter, ozone, nitrogen dioxide, air quality, classification and regression trees, multiple regression
Citation: Mendes L, Monjardino J and Ferreira F (2022) Air Quality Forecast by Statistical Methods: Application to Portugal and Macao. Front. Big Data 5:826517. doi: 10.3389/fdata.2022.826517
Received: 30 November 2021; Accepted: 08 February 2022;
Published: 10 March 2022.
Edited by:
Rasa Zalakeviciute, University of the Americas, EcuadorReviewed by:
Steffen M. Noe, Estonian University of Life Sciences, EstoniaQutu Jiang, The University of Hong Kong, Hong Kong SAR, China
Copyright © 2022 Mendes, Monjardino and Ferreira. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Luísa Mendes, bGMubWVuZGVzJiN4MDAwNDA7ZmN0LnVubC5wdA==