The advance of whole genome sequencing technology facilitates the identification of key genetic trait or alteration leading to disease development. The major function of current massively parallel genome sequencers is to perform de novo assembly or resequencing of euchromatic regions, about 94% of entire ...
The advance of whole genome sequencing technology facilitates the identification of key genetic trait or alteration leading to disease development. The major function of current massively parallel genome sequencers is to perform de novo assembly or resequencing of euchromatic regions, about 94% of entire human genome and representing the reference genome firstly obtained after completion of Human Genome Project. Massively parallel short or medium reads of tens to few thousands with sufficient coverage are able to piece together contiguous euchromatic sequences tolerating 1-4% error rate primarily from inherited DNA polymerase infidelity. However, these short- or medium-read sequencers are not suitable for determining heterochromatic sequences, which cover the yet-to-be completed 6% human genome and contain long repetitive nuclear elements, including the 45S rDNA , about 45-kb per copy and estimated 400 copies distributed across short arms of five human acrocentric chromosomes, and satellite DNAs in centromere. The tandemly repeated 45S rDNA and satellite DNAs are recombinational hot spots and are conserved throughout eukaryotes. Recent evidence further suggests that 45S rDNA rearrangement and concurrent epigenetic changes play a role in mammalian ontogeny and tissue differentiation, and are associated with speciation, aging, cancers, psychological disorders, and neurodegenerative diseases. The need to develop sequencing platforms capable of reading continuously a single long-stretch DNA/RNA strand with high accuracy is urgent. Long and accurate reads are essential for obtaining full-length heterochromatic sequences or haplotypes because combination of these two parameters is required to extend sequencing process across polymorphic sites within or flanking the repetitive sequences and to ensure correct genome assembly. Several newly proposed approaches show a potential to achieve long-reads of 50 kb or more, and are at various stages of development toward commercialization. These include new version of single molecule real time (SMRT) technology, biological and solid-state nanopore, nanogap, nanoribbon, nanochannel, and electron microscopy, which are moving away from fluorescence-based detection to electronic sensing or imaging. For the past 10 years, innovation of high-throughput short- and medium-read sequencing technology has significantly driven down the unit cost of sequencing an individual genome but bioinformatic analysis of massive sequencing data has become more challenging and laborious than ever. A technology of reading single long-stretch DNA/RNA strand would enable direct sequencing of native nucleic acids by eliminating fragmentation and amplification pre-sequencing preparations, reduction of bioinformatics hour, and differentiation of sequences in mixed microbiome and in mammalian cells due to mosaicism.
Important Note:
All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.