AUTHOR=Pienaar Michael A. , Sempa Joseph B. , Luwes Nicolaas , George Elizabeth C. , Brown Stephen C. TITLE=Elicitation of domain knowledge for a machine learning model for paediatric critical illness in South Africa JOURNAL=Frontiers in Pediatrics VOLUME=11 YEAR=2023 URL=https://www.frontiersin.org/journals/pediatrics/articles/10.3389/fped.2023.1005579 DOI=10.3389/fped.2023.1005579 ISSN=2296-2360 ABSTRACT=Objectives

Delays in identification, resuscitation and referral have been identified as a preventable cause of avoidable severity of illness and mortality in South African children. To address this problem, a machine learning model to predict a compound outcome of death prior to discharge from hospital and/or admission to the PICU was developed. A key aspect of developing machine learning models is the integration of human knowledge in their development. The objective of this study is to describe how this domain knowledge was elicited, including the use of a documented literature search and Delphi procedure.

Design

A prospective mixed methodology development study was conducted that included qualitative aspects in the elicitation of domain knowledge, together with descriptive and analytical quantitative and machine learning methodologies.

Setting

A single centre tertiary hospital providing acute paediatric services.

Participants

Three paediatric intensivists, six specialist paediatricians and three specialist anaesthesiologists.

Interventions

None.

Measurements and main results

The literature search identified 154 full-text articles reporting risk factors for mortality in hospitalised children. These factors were most commonly features of specific organ dysfunction. 89 of these publications studied children in lower- and middle-income countries. The Delphi procedure included 12 expert participants and was conducted over 3 rounds. Respondents identified a need to achieve a compromise between model performance, comprehensiveness and veracity and practicality of use. Participants achieved consensus on a range of clinical features associated with severe illness in children. No special investigations were considered for inclusion in the model except point-of-care capillary blood glucose testing. The results were integrated by the researcher and a final list of features was compiled.

Conclusion

The elicitation of domain knowledge is important in effective machine learning applications. The documentation of this process enhances rigour in such models and should be reported in publications. A documented literature search, Delphi procedure and the integration of the domain knowledge of the researchers contributed to problem specification and selection of features prior to feature engineering, pre-processing and model development.