AUTHOR=Abed Ibrahim Lina , Fekete István 

TITLE=What Machine Learning Can Tell Us About the Role of Language Dominance in the Diagnostic Accuracy of German LITMUS Non-word and Sentence Repetition Tasks

JOURNAL=Frontiers in Psychology

VOLUME=9

YEAR=2019

URL=https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2018.02757

DOI=10.3389/fpsyg.2018.02757

ISSN=1664-1078

ABSTRACT=<p>The present study investigates the performance of 21 monolingual and 56 bilingual children aged 5;6–9;0 on German LITMUS-sentence-repetition (SRT; <xref ref-type="bibr" rid="B84">Hamann et al., 2013</xref>) and non-word-repetition-tasks (NWRT; <xref ref-type="bibr" rid="B75">Grimm et al., 2014</xref>), which were constructed in accordance with the <bold>LITMUS</bold>-principles (<bold>L</bold>anguage <bold>I</bold>mpairment <bold>T</bold>esting in <bold>M</bold>ultilingual <bold>S</bold>ettings; <xref ref-type="bibr" rid="B10">Armon-Lotem et al., 2015</xref>). Both tasks incorporate phonologically and syntactically complex structures shown to be cross-linguistically challenging for children with Specific Language Impairment (SLI) and aim at minimizing bias against bilingual children while still being indicative of the presence of language impairment across language combinations (see <xref ref-type="bibr" rid="B123">Marinis and Armon-Lotem, 2015</xref>; for sentence-repetition; <xref ref-type="bibr" rid="B29">Chiat, 2015</xref> for non-word-repetition). Given the great variability in bilingual language exposure and the potential effect of language experience on language performance in bilingual children, we examined whether background variables related to bilingualism, particularly, the degree language dominance as measured by relative amount of use and exposure, could compromise the diagnostic accuracy of the German LITMUS-SRT and NWRT. We further investigated whether a combination of the two tasks provides better diagnostic accuracy and helps avoid cases of misdiagnosis. To address this, we used an unsupervised machine learning algorithm, the Partitioning-Around-Medoids (PAM, <xref ref-type="bibr" rid="B103">Kaufman and Rousseeuw, 2009</xref>), for deriving a clinical category for the children as ± language-impaired based on their performance scores on SRT and NWRT (in isolation and combined) while withholding information about their clinical status based on standardized assessment in their first (home language, L1) and second language (societal language, L2). Subsequently, we calculated diagnostic accuracy and used regression analysis to investigate which background variables (age of onset, length of exposure, degree of language dominance, socio-economic-status, and risk factors for SLI) best explained clinical-group-membership yielded from the PAM-analysis based on the children’s NWRT and SRT performance scores. Results show that although language-dominance clearly influences the performance of bilingual typically developing children, especially in the SRT, the diagnostic accuracy of the tools is not compromised by language dominance: while risk factors for SLI were significant predictors for clinical group membership in all models, language dominance did not contribute at all to explaining clinical cluster membership as typically developing or SLI based on any of the combinations of the SRT and NWRT variables. Additionally, results confirm that a combination of SRT scored by correct target structure and the structurally more complex language-dependent part of the NWRT yields better diagnostic accuracy than single measures and is only sensitive to risk factors for SLI and not to dominance levels or SES.</p>