AUTHOR=Valdez Danny , Goodson Patricia
TITLE=Language Bias in Health Research: External Factors That Influence Latent Language Patterns
JOURNAL=Frontiers in Research Metrics and Analytics
VOLUME=5
YEAR=2020
URL=https://www.frontiersin.org/journals/research-metrics-and-analytics/articles/10.3389/frma.2020.00004
DOI=10.3389/frma.2020.00004
ISSN=2504-0537
ABSTRACT=
Background: Concerns with problematic research are primarily attributed to statistics and methods used to support data. Language, as an extended component of problematic research in published work, is rarely given the same attention despite language's equally important role in shaping the discussion and framings of presented data.
Purpose: This study uses a topic modeling approach to study language as a predictor of potential bias among collected publication histories of several health research areas.
Methods: We applied Latent Dirichlet Allocation (LDA) topic models to dissect publication histories disaggregated by three factors commonly cited as language influencers: (1) time, to study ADHD pharmacotherapy; (2) funding source, to study sugar consumption; and (3) nation of origin, to study Pediatric Highly-Active Anti-Retroviral Therapy (P-HAART).
Results: We found that, for each factor, there were notable differences in language among each corpus when disaggregated by each factor. For time, article content changed to reflect new trends and research practices for the commonly prescribed ADHD medication, Ritalin. For funding source, industry and federally funded studies had differing foci, despite testing the same hypothesis. For nation of origin, regulatory structures between the United States and Europe seemingly influenced the direction of research.
Conclusion: This work presents two contributions to ethics research: (1) language and language framing should be studied as carefully as numeric data among studies of rigor, reproducibility, and transparency; and (2) the scientific community should continue to apply topic models as mediums to answer hypothesis-driven research questions.