Skip to main content

DATA REPORT article

Front. Genet.
Sec. Livestock Genomics
Volume 15 - 2024 | doi: 10.3389/fgene.2024.1466382

Whole genome sequencing and de novo genome assembly of the Kazakh native horse Zhabe

Provisionally accepted
  • 1 NJSC «Toraigyrov University», Kazakhsatn, Pavlodar, Kazakhstan
  • 2 NJSC «S. Seifullin Kazakh Agrotechnical University», Pavlodar, Kazakhstan
  • 3 Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
  • 4 Laboratory of Genomic and Personalized Medicine, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
  • 5 School of Sciences and Humanities, Nazarbayev University, Astana, Kazakhstan

The final, formatted version of the article will be published soon.

    The horse (Equus caballus) is a domesticated animal with great significance in human civilization and history, having played a crucial role in transportation, agriculture, and warfare. Over millennia, intentional breeding has resulted in the creation of approximately 500 distinct horse breeds, each selected for specific performance qualities, appearance, and behavior (Petersen et al., 2013). The earliest evidence of horse domestication dates back to the Eneolithic Botai culture (3500 BCE) in prehistoric Northern Kazakhstan, where horses continue to hold cultural significance (Outram et al. 2009;Levine, 1999;Sarbassova, 2015). Although domestication in Botai occurred independently of the main domestication path, horses have been an essential aspect of steppe pastoralism in the region of modern Kazakhstan since the Bronze Age (Kyselý and Peške, 2022;Frachetti and Benecke, 2009;Outram et al., 2012). As a result, traditional selection over hundreds and thousands of years has shaped the Kazakh horse breed (Kabylbekova et al., 2024).Zhabe is an intrabreed type of Kazakh horse that originated in Western Kazakhstan and is currently used throughout the country (Figure 1A). This type is known for its strong, slightly rough constitution and high endurance. Horses of this type are characterized by a coarse head, a short fleshy neck, a wide and deep body, a broad back, a muscular croup, and strong, bony legs. They also have a thick, long mane and tail, short fetlocks on the legs, and dense skin. Their colors are typically bay or dark red, but can also be mousey, gray, or black (Dmitriev and Ėrnst, 1989). In state farm conditions, Kazakh horses, including Zhabe, have been selectively bred for increased size and weight.They are well-adapted to traditional Kazakh methods of seasonal pasturing and are bred in herds, even inharsh winter climatic conditions, to produce working horses, meat, and milk (Omarov et al., 2019).Previous studies have characterized Kazakh horses using array-based genotyping, RNA-seq, and WGBS-seq (Pozharskiy et al., 2023;Liu et al., 2018;Yu et al., 2021;Liu et al., 2023). This study presents the first high-quality genome assemblies for six Kazakh horses of the Zhabe type, providing a valuable resource for genetic research and comparative genomics. Conserving genetic diversity is vital for the present and future maintenance of the valuable traits of the breed (Bruford et al., 2015).It is also widely acknowledged that comprehensive molecular genetic data characterizing inter-and intraspecies diversity is important for the efficient management of genetic resources economically important animal varieties (Ruane, 2000;Simianer, 2005;Toro et al., 2009). Here, we present six new de novo genome assemblies, generated using Oxford Nanopore Technology, for Kazakh horses of the Zhabe traditional type. Peripheral blood samples from six horses (2H, 7H, 16H, 25H, 30H, and 57H) were collected in 1 mL volumes at "Akzhar Ondiris" horse farm (51°32'07.4"N 77°27'16.9"E) in Pavlodar region of Kazakhstan (Figure 1A). All samples were anticoagulated with EDTA and refrigerated at 4 °C. The phenotypic characteristics of these horses are detailed in Supplementary Table S1. Genomic DNA was extracted from the samples using Illustra Blood Kit (Cytiva, USA) and Gentra Puregene Blood Kit (Qiagen, Germany) following the manufacturers' protocols. The concentration and quality of the extracted DNA were checked using a Qubit fluorometer (Invitrogen, USA), a Nanodrop 2000 spectrophotometer (Thermo Scientific, USA), and 1% agarose gel electrophoresis. This highmolecular-weight DNA was then used for library construction and subsequent Nanopore sequencing. To generate Oxford Nanopore long reads, 3 µg of genomic DNA was randomly sheared to obtain a target size of 20 kbp using g-TUBE (Covaris, USA) and processed according to the Ligation Sequencing Kit (SQK-LSK110) protocol (Oxford Nanopore Technologies, UK). For genome sequencing, at least 1 µg of sheared DNA from each sample was utilized for library construction. DNA fragments were repaired using NEBNext FFPE Repair Mix (New England Biolabs, USA). End repair and A-tailing were performed using the NEBNext End Repair/dA-Tailing Module kit (New England Biolabs, USA), followed by ligation of Oxford Nanopore sequencing adapters with the NEBNext Quick Ligation Module (E6056) (New England Biolabs, USA). The constructed libraries were sequenced on R9.4.1 flow cells of PromethION sequencer (Oxford Nanopore Technologies, UK) for 72 hours. Basecalling of the raw signal data was performed using Guppy v.5.1.13, which also trimmed adapters and removed low-quality sequencing reads with a Q-score below 9.0. All DNA samples were sequenced with an average coverage of 26X. A summary of the sequenced reads is provided in Supplementary Table S2. Draft assemblies were produced using one round of Flye v.2.9.2 (Kolmogorov et al., 2019), followed by a polishing round with Oxford Nanopore Technologies (ONT) reads using Medaka v.1.11.1 (https://github.com/nanoporetech/medaka). To evaluate the quality of the final assemblies, we aligned the ONT contigs to EquCab3.0 reference genome assembly (NCBI Accession No. GCF_002863925.1) and assessed them with QUAST v.5.2.0 (Gurevich et al., 2013). Considering the advanced sequencing ability of ONT, the longest contig among the assembled genomes was 92.32 Mb, and the largest contig N50 was 28.26 Mb. The completeness of the genome assemblies was further assessed using BUSCO v.5.4.6, (Simão et al., 2015) which compared the genome against the laurasiatheria_odb10 database containing 12,234 orthologous genes. BUSCO assessment scores ranged from 93% to 95% (Figure 1B, Table 1), indicating high completeness for the obtained assemblies.

    Keywords: Kazakh horse, Oxford Nanopore Technologies (ONT), de novo assembly, Kazakhstan, whole genome sequencing (WGS)

    Received: 17 Jul 2024; Accepted: 07 Oct 2024.

    Copyright: © 2024 Assanbayev, Sharapatov, Akilzhanov, Bektayev, Samatkyzy, Karabayev, Gabdulkayum, Daniyarov, Rakhimova, Kozhamkulov, Sarbassov, Akilzhanova and Kairov. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Ainur Akilzhanova, Laboratory of Genomic and Personalized Medicine, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
    Ulykbek Kairov, Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.