The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Public Health
Sec. Infectious Diseases: Epidemiology and Prevention
Volume 12 - 2024 |
doi: 10.3389/fpubh.2024.1497100
Wastewater-based Epidemiology: Deriving a SARS-CoV-2 Data Validation Method to Assess Data Quality and to Improve Trend Recognition
Provisionally accepted- 1 German Federal Environment Agency, Berlin, Germany
- 2 Robert Koch Institute (RKI), Berlin, Berlin, Germany
- 3 Technical University of Munich, Munich, Bavaria, Germany
Accurate and consistent data play a critical role in enabling health officials to make informed decisions regarding emerging trends in SARS-CoV-2 infections. Alongside traditional indicators such as the 7-day-incidence rate, wastewater-based epidemiology can provide valuable insights into SARS-CoV-2 concentration changes. However, the wastewater compositions and wastewater systems are rather complex. Multiple effects such as precipitation events or industrial discharges might affect the quantification of SARS-CoV-2 concentrations. Hence, analysing data from more than 150 wastewater treatment plants (WWTP) in Germany necessitates an automated and reliable method to evaluate data validity, identify potential extreme events, and, if possible, improve overall data quality.We developed a method that first categorizes the data quality of WWTPs and corresponding laboratories based on the number of outliers in the reproduction rate as well as the number of implausible inflection points within the SARS-CoV-2 time series. Subsequently, we scrutinized statistical outliers in several standard quality control parameters (QCP) that are routinely collected during the analysis process such as the flow rate, the electrical conductivity, or surrogate viruses like the pepper mild mottle virus. Furthermore, we investigated outliers in the ratio of the analysed gene segments that might indicate laboratory errors. To evaluate the success of our method, we measure the degree of accordance between identified QCP outliers and outliers in the SARS-CoV-2 concentration curves.Our analysis reveals that the flow and gene segment ratios are typically best at identifying outliers in the SARS-CoV-2 concentration curve albeit variations across WWTPs and laboratories. The exclusion of datapoints based on QCP plausibility checks predominantly improves data quality. Our derived data quality categories are in good accordance with visual assessments. Good data quality is crucial for trend recognition, both on the WWTP level and when aggregating data from several WWTPs to regional or national trends. Our model can help to improve data quality in the context of health-related monitoring and can be optimised for each individual WWTP to account for the large diversity among WWTPs.
Keywords: SARS-CoV-2, Data plausibility, Automated quality control, Wastewater-based epidemiology, wastewater treatment plant classification, Outlier detection
Received: 16 Sep 2024; Accepted: 27 Nov 2024.
Copyright: © 2024 Saravia Arzabe, Pütz, Wurzbacher, Uchaikina, Drewes, Braun, Bannick and Obermaier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Cristina Saravia Arzabe, German Federal Environment Agency, Berlin, Germany
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.