AUTHOR=Rehman Rana Zia Ur , Guan Yu , Shi Jian Qing , Alcock Lisa , Yarnall Alison J. , Rochester Lynn , Del Din Silvia TITLE=Investigating the Impact of Environment and Data Aggregation by Walking Bout Duration on Parkinson’s Disease Classification Using Machine Learning JOURNAL=Frontiers in Aging Neuroscience VOLUME=14 YEAR=2022 URL=https://www.frontiersin.org/journals/aging-neuroscience/articles/10.3389/fnagi.2022.808518 DOI=10.3389/fnagi.2022.808518 ISSN=1663-4365 ABSTRACT=
Parkinson’s disease (PD) is a common neurodegenerative disease. PD misdiagnosis can occur in early stages. Gait impairment in PD is typical and is linked with an increased fall risk and poorer quality of life. Applying machine learning (ML) models to real-world gait has the potential to be more sensitive to classify PD compared to laboratory data. Real-world gait yields multiple walking bouts (WBs), and selecting the optimal method to aggregate the data (e.g., different WB durations) is essential as this may influence classification performance. The objective of this study was to investigate the impact of environment (laboratory vs. real world) and data aggregation on ML performance for optimizing sensitivity of PD classification. Gait assessment was performed on 47 people with PD (age: 68 ± 9 years) and 52 controls [Healthy controls (HCs), age: 70 ± 7 years]. In the laboratory, participants walked at their normal pace for 2 min, while in the real world, participants were assessed over 7 days. In both environments, 14 gait characteristics were evaluated from one tri-axial accelerometer attached to the lower back. The ability of individual gait characteristics to differentiate PD from HC was evaluated using the Area Under the Curve (AUC). ML models (i.e., support vector machine, random forest, and ensemble models) applied to real-world gait showed better classification performance compared to laboratory data. Real-world gait characteristics aggregated over longer WBs (WB 30–60 s, WB > 60 s, WB > 120 s) resulted in superior discriminative performance (PD vs. HC) compared to laboratory gait characteristics (0.51 ≤ AUC ≤ 0.77). Real-world gait speed showed the highest AUC of 0.77. Overall, random forest trained on 14 gait characteristics aggregated over WBs > 60 s gave better performance (F1 score = 77.20 ± 5.51%) as compared to laboratory results (F1 Score = 68.75 ± 12.80%). Findings from this study suggest that the choice of environment and data aggregation are important to achieve maximum discrimination performance and have direct impact on ML performance for PD classification. This study highlights the importance of a harmonized approach to data analysis in order to drive future implementation and clinical use.
[09/H0906/82].