AUTHOR=Mugotitsa Bylhah , Bhattacharjee Tathagata , Ochola Michael , Mailosi Dorothy , Amadi David , Andeso Pauline , Kuria Joseph , Momanyi Reinpeter , Omondi Evans , Kajungu Dan , Todd Jim , Kiragga Agnes , Greenfield Jay TITLE=Integrating longitudinal mental health data into a staging database: harnessing DDI-lifecycle and OMOP vocabularies within the INSPIRE Network Datahub JOURNAL=Frontiers in Big Data VOLUME=7 YEAR=2024 URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2024.1435510 DOI=10.3389/fdata.2024.1435510 ISSN=2624-909X ABSTRACT=Background

Longitudinal studies are essential for understanding the progression of mental health disorders over time, but combining data collected through different methods to assess conditions like depression, anxiety, and psychosis presents significant challenges. This study presents a mapping technique allowing for the conversion of diverse longitudinal data into a standardized staging database, leveraging the Data Documentation Initiative (DDI) Lifecycle and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standards to ensure consistency and compatibility across datasets.

Methods

The “INSPIRE” project integrates longitudinal data from African studies into a staging database using metadata documentation standards structured with a snowflake schema. This facilitates the development of Extraction, Transformation, and Loading (ETL) scripts for integrating data into OMOP CDM. The staging database schema is designed to capture the dynamic nature of longitudinal studies, including changes in research protocols and the use of different instruments across data collection waves.

Results

Utilizing this mapping method, we streamlined the data migration process to the staging database, enabling subsequent integration into the OMOP CDM. Adherence to metadata standards ensures data quality, promotes interoperability, and expands opportunities for data sharing in mental health research.

Conclusion

The staging database serves as an innovative tool in managing longitudinal mental health data, going beyond simple data hosting to act as a comprehensive study descriptor. It provides detailed insights into each study stage and establishes a data science foundation for standardizing and integrating the data into OMOP CDM.