Skip to main content

ORIGINAL RESEARCH article

Front. Ecol. Evol.

Sec. Evolutionary and Population Genetics

Volume 13 - 2025 | doi: 10.3389/fevo.2025.1572596

This article is part of the Research Topic Forensic Investigative Genetic Genealogy and Fine-Scale Structure of Human Populations, Volume II View all 4 articles

Fine-Scale Biogeographical Ancestry Inference in Southeast and East Asians via High-Efficiency Markers and Machine Learning Approaches

Provisionally accepted
  • 1 Kunming Medical University, Kunming, Yunnan Province, China
  • 2 West China Hospital, Sichuan University, Chengdu, Sichuan Province, China
  • 3 Anti-Drug Technology Center of Guangdong Province, Guangzhou, China

The final, formatted version of the article will be published soon.

    Biogeographical ancestry inference offers valuable clues for forensic cold cases, but limited information is typically obtained from substructured populations within continental East Asian and Southeast groups.This study presents an integrative genomic dataset of 3,461 individuals from East Asia and Southeast Asia to elucidate the fine-scale population substructure and its role in precision forensic medicine. Six nested panels were developed with increasing ancestry-informative marker (AIM) density (ranging from 50 to 2,000 SNPs) to distinguish fine genetic differences between the six language groups and populations within the Sino-Tibetan language family. We found that the 2000 AIM panel exhibited differentiation efficiency in PCA comparable to that of all loci. Additionally, we constructed a classification machine learning model with an average prediction accuracy of 84%, highlighting the critical role of geographical information in improving model accuracy. Furthermore, we validated the accuracy of the deep learning method Locator in predicting geographical coordinates solely based on genetic information. This work highlights the power of integrating genetic and geographic data with artificial intelligence to refine fine-scale biogeographical ancestry inference, offering more profound insights into population structure in East Asia and Southeast Asia, with significant implications for forensic applications.

    Keywords: Biological ancestry inference, Ancestry-informative markers, East Asian, Southeast Asian, machine learning

    Received: 07 Feb 2025; Accepted: 17 Mar 2025.

    Copyright: © 2025 Yang, Chen, Nie, Liu, Deng and He. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Hong Deng, Kunming Medical University, Kunming, 650500, Yunnan Province, China
    Guanglin He, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

    People also looked at

    Research integrity at Frontiers

    Man ultramarathon runner in the mountains he trains at sunset

    94% of researchers rate our articles as excellent or good

    Learn more about the work of our research integrity team to safeguard the quality of each article we publish.


    Find out more