ORIGINAL RESEARCH article
Front. Ecol. Evol.
Sec. Evolutionary and Population Genetics
Volume 13 - 2025 | doi: 10.3389/fevo.2025.1572596
This article is part of the Research Topic Forensic Investigative Genetic Genealogy and Fine-Scale Structure of Human Populations, Volume II View all 4 articles
Fine-Scale Biogeographical Ancestry Inference in Southeast and East Asians via High-Efficiency Markers and Machine Learning Approaches
Provisionally accepted- 1 Kunming Medical University, Kunming, Yunnan Province, China
- 2 West China Hospital, Sichuan University, Chengdu, Sichuan Province, China
- 3 Anti-Drug Technology Center of Guangdong Province, Guangzhou, China
You have multiple emails registered with Frontiers:
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Biogeographical ancestry inference offers valuable clues for forensic cold cases, but limited information is typically obtained from substructured populations within continental East Asian and Southeast groups.This study presents an integrative genomic dataset of 3,461 individuals from East Asia and Southeast Asia to elucidate the fine-scale population substructure and its role in precision forensic medicine. Six nested panels were developed with increasing ancestry-informative marker (AIM) density (ranging from 50 to 2,000 SNPs) to distinguish fine genetic differences between the six language groups and populations within the Sino-Tibetan language family. We found that the 2000 AIM panel exhibited differentiation efficiency in PCA comparable to that of all loci. Additionally, we constructed a classification machine learning model with an average prediction accuracy of 84%, highlighting the critical role of geographical information in improving model accuracy. Furthermore, we validated the accuracy of the deep learning method Locator in predicting geographical coordinates solely based on genetic information. This work highlights the power of integrating genetic and geographic data with artificial intelligence to refine fine-scale biogeographical ancestry inference, offering more profound insights into population structure in East Asia and Southeast Asia, with significant implications for forensic applications.
Keywords: Biological ancestry inference, Ancestry-informative markers, East Asian, Southeast Asian, machine learning
Received: 07 Feb 2025; Accepted: 17 Mar 2025.
Copyright: © 2025 Yang, Chen, Nie, Liu, Deng and He. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Hong Deng, Kunming Medical University, Kunming, 650500, Yunnan Province, China
Guanglin He, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.