Skip to main content

DATA REPORT article

Front. Plant Sci.

Sec. Functional and Applied Plant Genomics

Volume 16 - 2025 | doi: 10.3389/fpls.2025.1520832

This article is part of the Research Topic Multi-Omics, Genetic Evolution and Crop Domestication View all 17 articles

Transcriptome Map and Genome Annotation of Flax Line 3896

Provisionally accepted
  • 1 Engelhardt Institute of Molecular Biology (RAS), Moscow, Russia
  • 2 Federal Research Center for Bast Fiber Crops (Russia), Torzhok, Russia

The final, formatted version of the article will be published soon.

    Flax (Linum usitatissimum L.) is valued and cultivated worldwide for its dual-purpose use as both a seed and fiber crop. It is one of the oldest domesticated and most widely used crops (Kvavadze et al., 2009; Ottai et al., 2011). Flax genome has a diploid chromosome set 2x=2n=30 and a size ~450 Mb (Dvorianinova et al., 2022; You et al., 2023). Flax seeds are rich in omega-3 fatty acids, digestible proteins, dietary fiber, and lignans (Goyal et al., 2014). Consumption of linseed oil has beneficial effects on human health, reducing the risks of many disorders and promoting immunity (Mali et al., 2019; Saini et al., 2021; Al-Madhagy et al., 2023). In addition, flax seed oil is used for technical purposes and also serves as a highly nutritious feed for livestock (Xu et al., 2022; Yadav et al., 2024). Flax fiber goes into the production of eco-friendly textiles with high absorption capacity and composite materials (Asyraf et al., 2022; More, 2022).Different flax varieties are grown for different purposes and vary considerably in their characteristics. Today, traditional breeding is being aided by biotechnology and molecular genetics to select individuals with the desired traits more quickly and efficiently due to high-quality genome assemblies and their annotations. Since it is the study of transcriptomes in various tissues and organs that allows us to establish the associations between a valuable characteristic and its causative region in the genome (Dmitriev et al., 2020; Guo et al., 2020). To date, there is a great deal of disparate data on gene expression in different flax organs and tissues, under different growth conditions and at different stages of ontogeny. Much work was devoted to the study of the involvement of specific genes in important agronomic traits, resistance to biotic and abiotic stressors, the regulation of organogenesis in flax plants and other characteristics.Agriculturally valuable traits include those related to yield and to the quality of oil and fiber. The yield and the quality of oil were the subject of great interest of many studies (Xie et al., 2019; Miart et al., 2021; Gao et al., 2022; Jiang et al., 2022; Wang et al., 2022b; Dvorianinova et al., 2023b; Pushkova et al., 2024). Since the valuable product obtained from flax is fiber, the characteristics of phloem fibers were actively studied (Roach and Deyholos, 2007; Zhang and Deyholos, 2016; Gorshkov et al., 2017; Gorshkova et al., 2018; Gorshkov et al., 2019; Galinousky et al., 2020; Mokshina et al., 2020; Petrova et al., 2021; Guo et al., 2022; Mokshina et al., 2022; Yu et al., 2022; Bao et al., 2023; Gorshkova et al., 2023; Liu et al., 2023; Ibragimova and Mokshina, 2024). In addition, transcriptome analysis largely allowed the identification of genes associated with flax plant height (Guo et al., 2021), the length of the growing season, the time of flowering, and the duration of ripening (Gao et al., 2022; House et al., 2022).Data on flax gene expression and co-expression under suboptimal environmental conditions allowed researches to assess the association of genes with resistance to pathogen infection (Galindo-González and Deyholos, 2016; Dmitriev et al., 2017; Wu et al., 2019b; Boba et al., 2021; He et al., 2022) and abiotic stressors (Yu et al., 2014; Dmitriev et al., 2016; Dash et al., 2017; Wu et al., 2018; Krasnov et al., 2019; Wu et al., 2019a; Huang et al., 2021; Wang et al., 2021; Soto-Cerda et al., 2022; Wang et al., 2022a; Garoosi et al., 2023; Kostyn et al., 2023; Qiu et al., 2023; Wang et al., 2023; Zhang et al., 2024).The regulation of organogenesis in flax plays an important role in understanding the development of valuable flax traits (Saha et al., 2021; Yuan et al., 2021; Qi et al., 2023).The comparative study of expression profiles of linseed and fiber flax varieties identified genes associated with flax plant type, flax oil odor, and paleohistorical data (Sveinsson et al., 2014; Griffiths and Datla, 2019; Dmitriev et al., 2020; Povkhova et al., 2021; Yang et al., 2022).Several synthesis articles with annotations for genome assemblies of flax varieties were published: linseed CDC Bethune (Wang et al., 2012) and fiber flax YY5 (Sa et al., 2021). However, the CDC Bethune genome contains some errors because it was assembled only from Illumina reads, which did not allow to resolve its complexity (Sa et al., 2021; Dvorianinova et al., 2023a). The YY5 genome was annotated with transcriptome data of a different variety for only five samples of mature flax plants: leaf, stem, root, flower, and fruit.In the NCBI database, the reference genome of L. usitatissimum is currently represented by a high-quality assembly of line 3896 (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_030674075.2/, accessed on 12 October 2024) obtained by us earlier (Dvorianinova et al., 2023a). Line 3896 belongs to the group of linseed flax and is characterized by resistance to Fusarium wilt (Rozhmina and Loshakova, 2016; Dmitriev et al., 2017) and edaphic stressor (low acidity) (Rozhmina et al., 2020), high seed yield and oil content (our observations). In the present study, we complement previous studies of line 3896 with a transcriptome map and genome annotation, which were necessary to make further progress in the field of flax genome research. Our annotation is obtained with the use of RNA-Seq data, whose positive effect on the annotation result was previously shown (Salzberg, 2019; Gabriel et al., 2024). The study of flax genome organization and gene expression will allow the development of methods to obtain improved varieties with desired traits with high efficiency. The results of the study are of use for the identification of genes and polymorphisms responsible for valuable traits and development of modern breeding technologies: genome editing, marker-assisted and genomic selection.Seeds of linseed line 3896 were provided by the Institute for Flax (Torzhok, Russia). To obtain transcriptome data, we collected a set of organs (Table 1, Figure 1) of line 3896 plants at different stages of vegetation under optimal growth conditions described in the next subsection.Flax seeds were sterilized in 1% sodium hypochlorite and 96% ethanol, after which the seeds were washed twice with water and planted in 0.05% fungicide Maxim (Syngenta, Gaillon, France) solution in Petri dishes for 7 days. Seedlings were planted in the soil and continued to grow under greenhouse conditions at 20°C and ~50-70% relative humidity with regular watering.For transcriptome analysis, a set of different organs/tissues at different development stages was collected. The plant parts and ontogenetic stages used in the study are listed in Table 1. Each of the samples was collected in two biological replicates. Moreover, each sample was a pool of organs from 2-10 different plants (except capsules and seeds), which is necessary to level out differences between samples and be able to capture trends common to the species (Takele Assefa et al., 2020). The flowers were marked with the date of the day it opened (day of flowering). Seeds from the same capsules were pooled. Capsules were not pooled. The age of the plants at the time of each collection and the collection conditions are shown in Table 1. Samples were collected in the middle of the day between 12 and 15 h under similar conditions to smooth the influence of circadian rhythms on gene expression profiles. Samples were collected in liquid nitrogen and stored at -70°C.Samples #1-6 and #8-18 were grinded using a TissueLyser II homogenizer (Qiagen, Hilden, Germany) with the addition of 3 ceramic beads for two minutes. The harder samples (#7 and #19-28) were homogenized using a disposable pestle inserted in a DeWALT DCD701D2 cordless drill/driver (DeWALT, Towson, MD, USA) at 1200-1500 rpm in 1.5 ml tubes in liquid nitrogen to a fine powder, without allowing the sample to thaw. RNA isolation from samples #1-19 was performed using the Quick-RNA Miniprep Kit (Zymo Research, Irvine, CA, USA). RNA isolation from capsule and seed samples (#19-28) was performed by CTAB with modifications described previously (Pushkova et al., 2024). After that, total RNA was additionally cleaned using the CleanRNA Standard kit (Evrogen, Moscow, Russia) according to the manufacturer's protocol with a DNAase I treatment step from the RNase-Free DNase Set (Qiagen). The quality of isolated RNA was checked by horizontal electrophoresis in 2% agarose gel and 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA), and concentrations were measured using a Qubit 4.0 fluorometer (Thermo Fisher Scientific, Waltham, MA, USA).The cDNA libraries were prepared using the QIAseq Stranded mRNA Select Kit (Qiagen) according to the manufacturer’s protocol. The quality of the obtained cDNA libraries (agreement of the length of the obtained libraries with the expected one and the absence of adapter dimers) was assessed on the Qsep1-Plus capillary electrophoresis system (BiOptic, New Taipei City, Taiwan), and the concentration was evaluated on a Qubit 4 fluorometer (Thermo Fisher Scientific). The cDNA libraries (all samples from Table 1) were mixed equimolarly and sequenced on a NextSeq 2000 instrument (Illumina, San Diego, CA, USA) using the NextSeq 2000 P3 Reagents (100 Cycles) kit (Illumina) in 51+51 nucleotide format.RNA-Seq reads were trimmed with fastp 0.23.4 using default parameters (Chen et al., 2018). Structural annotation for the flax line 3896 genome assembly (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_030674075.2/, accessed on 12 October 2024) (Dvorianinova et al., 2023a) was performed using BRAKER3 3.0.8 (Gabriel et al., 2024). This process utilized our RNA-Seq transcriptome data in combination with known Viridiplantae sequences from OrthoDB protein database for gene prediction (Kuznetsov et al., 2023). Functional annotation of the predicted genes was carried out using a local version of InterProScan 5.69-101.0 to assign functional domains and predict gene functions (Jones et al., 2014). The completeness of the annotation was assessed using BUSCO 5.7.1 in protein mode (the eudicots_odb10 dataset). Default parameters of program settings were used in the data analysis.Gene expression analysis was performed with PPline (Krasnov et al., 2015) with default parameters of program settings and included the following steps:•Alignment of the trimmed RNA-Seq reads to the annotated in the present study reference genome of flax line 3896 using STAR 2.7.2b (Dobin et al., 2013).•Quantification of gene expression: read overlaps with annotated genomic features were counted using featureCounts 1.6.0 (Liao et al., 2014).•Calculation of the normalized gene expression values, in counts per million (CPM), using edgeR.•Sample normalization using TMM to account for differences in library sizes and composition between samples.Custom Gene Ontology (GO) annotation for line 3896 was constructed using EggNOG-mapper (--tax_scope 33090) (emapper 2.1.12, eggNOG DB version: 5.0.2) for the longest proteins in the constructed gene annotation (Huerta-Cepas et al., 2019; Cantalapiedra et al., 2021). Differential gene expression analysis was conducted for a balanced subset of flax samples: mature leaf blades from the shoot at 10 cm from the top at 30 days after germination (DAG), stem fragments 1-3 cm from the top at 30 DAG, roots of seedlings at 5 DAG, hypocotyls of seedlings at 5 DAG, mature flower carpels before pollination at 56 DAG, capsule without seeds at 14 DAF, seeds without capsule at 14 days after flowering (DAF), shoot apical meristem (SAM) of seedlings at 5 DAG. Gene expression in each tissue was compared to that of the other tissues from the subset of flax samples. Genes with FDR (QLF) ˂ 0.05 and logFC > 1.5 were selected for gene enrichment analysis. Gene enrichment was performed using the constructed annotation, the selected gene lists, and the topGO 2.54.0 package (Alexa and Rahnenfuhrer, 2023).This article presents data of transcriptome analysis of 28 organs/tissues of flax line 3896: 7 samples of vegetative organs of actively growing individuals, 4 samples of seedlings, 7 samples of various parts of generative organs at flowering stage, and 10 samples of fruits and seeds at different stages of maturation (Table 1, Figure 1). We set out to cover all growth stages of flax and all key events in the development of valuable flax traits. From 5.4 to 20.5 million raw reads (51+51 bp) were obtained for each sample on the Illumina platform (two biological replicates were sequenced for each sample). The raw data were deposited in the NCBI Sequence Read Archive (SRA) under the BioProject accession number PRJNA1172129.After trimming, the reads were mapped to the genome assembly of line 3896, and on average ~95% of the reads in each library were mapped (on average ~90% were uniquely mapped), confirming the high quality of the transcriptome data.We collected extensive data on gene expression in 28 organs/tissues of flax line 3896 at different development stages. Using the obtained transcriptome data and the Viridiplantae protein sequence database, we annotated the line 3896 genome assembly with BRAKER3: 39,055 genes and 45,154 transcripts were predicted, and 37,787 of these genes were annotated using InterProScan (Supplementary Data 1). The high completeness of annotation was achieved according to BUSCO (Benchmarking Universal Single-Copy Orthologs) – 95.6% (eudicots). Notably, 22.2% of all BUSCO were complete and single-copy and 73.4% were complete and duplicated. Such a high percentage of duplicated BUSCO was expected for an ancient tetraploid (Bolsheva et al., 2017).Utilizing the obtained annotation of line 3896, we performed an analysis of our transcriptome data, which resulted in the identification of genes exhibiting both tissue-specific and developmental stage-specific expression patterns within flax organs/tissues. This analysis led to the generation of a comprehensive transcriptome map for line 3896. To present the data in a convenient format for further analysis, we used PPline and RTrans (https://github.com/gskrasnov/RTrans, accessed on 17 October 2024). It was applied to evaluate the expression levels of the identified genes as read counts per million reads (CPM). Our transcriptome map is summarized in Supplementary Table 1 and presented as a heatmap (Supplementary Figure 1).During the gene pathway enrichment analysis, the following was found out. For leaves collected at a distance of 10 cm from the apex, compared to other tissues/organs, the GO terms are represented by the processes of photosynthesis, carbohydrate metabolism, plastid organization, electron transport chain, pigment synthesis, and transmembrane transport. These processes are characteristic of the main photosynthetic organ of the plant in the active phase (Müller and Munné-Bosch, 2021; Leister, 2023). In the stem fragment, located 1-3 centimeters from the top, the processes of vascular tissue histogenesis, vascular and phloem transport, stem morphogenesis, and response to auxin synthesized in the apical meristem predominate. This is logical, since we are dealing with an axial organ whose main function is the transport of metabolites, and the incision is made close to the site of their differentiation. (Yoshida et al., 2009; Kułak et al., 2023). It has been determined that the major expression pathways in the seedling root include the processes of water and solute transport, root hair formation and growth, response to chemical and mechanical stimuli, and metabolism of auxin, other hormones, and secondary metabolites, which are the main processes occurring in the roots of vascular plants (Vissenberg et al., 2020; Li et al., 2021; Castillo-Jiménez et al., 2023). A GO analysis of the hypocotyl transcriptome revealed differential expression of genes related to pathways involving amino acid biosynthesis, as well as active regulation of biosynthetic processes, brassinosteroid metabolism, and cell growth. These findings indicate the presence of active developmental processes in the axial organs of seedlings (Favero et al., 2021). GO analysis for flax pistil demonstrated the representation of genes that are associated with the formation and development of generative structures, pollination, and pollen tube growth. Furthermore, it demonstrated representation of genes associated with active ion metabolism, which is necessary for directed pollen tube growth (Zhou et al., 2022). GO analysis for capsules (14 DAF) indicated that the GO terms associated with the formation of secondary cell wall, synthesis of its components, and lignification prevail. These processes enable the preparation of dry fruits for opening and seed dispersal (Seymour et al., 2013). Additionally, catabolism of organic compounds used for cell wall construction and seed maturation is active in ripening capsules. The differential expression pattern of flax seeds (14 DAF) is dominated by processes related to seed and fruit development and maturation, lipid storage and fatty acid synthesis, as well as abscisic acid metabolism and regulation. Oil accumulation corresponds to the primary function of the seed as a reproductive organ, and ABA regulates its maturation (Sano and Marion-Poll, 2021; Dvorianinova et al., 2023b). The shoot apical meristem displays a hallmark pattern indicative of actively dividing cells. The process of SAM is characterized by the macromolecule biosynthesis, ribosome assembly, translation, RNA processing, DNA reparation, organelle formation, nucleosome assembly, and chromatin remodeling (Xue et al., 2020; Burian, 2021). Thus, the results of the gene enrichment analysis of the subset of flax samples in general looked logical and confirmed the adequacy of the obtained data. In detail they are presented in Supplementary Table 2.Flax is of great industrial and nutritional value and is therefore actively studied at the molecular-genetic level. To date, there are several flax genome assemblies, some with annotations, and many scattered gene expression data for different varieties. In this study, we used Illumina sequencing to obtain comprehensive transcriptome data for flax line 3896, whose genome is currently a reference for the species Linum usitatissimum L. in the NCBI database. Gene expression profiles were analyzed in 28 various flax organs/tissues at different stages of ontogenesis. With these data we were able to annotate the genome of line 3896 and generate a high-quality transcriptome map. The transcriptome map will allow the identification of genes that have a high expression level in a particular organ/tissue. Such genes may play a key role in the biological processes taking place in that organ/tissue. In addition, data on gene expression profiles during plant development can help to determine the most important time points at which the processes of interest occur. The transcriptome map also allows the determination of gene functions based not only on homology analysis, but also taking into account gene expression patterns in different organs/tissues. Thus, the transcriptome map and annotation presented in this work allow reaching a new level in the molecular-genetic studies of flax, the search for key genes responsible for the valuable traits, the development of new approaches in flax breeding and the creation of improved varieties.

    Keywords: Flax, Linum usitatissimum, line 3896, Linseed, transcriptome map, Reference genome, Genome annotation

    Received: 31 Oct 2024; Accepted: 25 Mar 2025.

    Copyright: © 2025 Zhernova, Arkhipov, Rozhmina, Zhuchenko, Bolsheva, Sigova, Dvorianinova, Borkhert, Pushkova, Melnikova and Dmitriev. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Daiana Zhernova, Engelhardt Institute of Molecular Biology (RAS), Moscow, Russia
    Alexey A. Dmitriev, Engelhardt Institute of Molecular Biology (RAS), Moscow, Russia

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

    People also looked at

    Research integrity at Frontiers

    Man ultramarathon runner in the mountains he trains at sunset

    95% of researchers rate our articles as excellent or good

    Learn more about the work of our research integrity team to safeguard the quality of each article we publish.


    Find out more