Overview of sequencing technologies over time.
Open access article
This Article is part of Bioinformatics Section
Article metrics overview
477 Article Downloads
View Full Metrics
Article Type: Editorial
Date of acceptance: December 2023
Date of publication: December 2023
DoI: 10.5772/dmht.21
copyright: ©2023 The Author(s), Licensee IntechOpen, License: CC BY 4.0
Author information
“…
[A] knowledge of sequences could contribute much to our understanding of living matter. ”[Frederick Sanger [1]]
The order of nucleotides inside a DNA sequence is critical as it serves as the basis for the “Genetic Code” or “Code of Life”. It supports subsequent protein function and, in the case of mutations, disease manifestation. Identifying the order of nucleotides in biological samples is essential for various research applications.
This year, we celebrated the twentieth anniversary of the conclusion of the Human Genome Project, which aimed to sequence, for the first time, the complete human genome. The initiative followed the
This unprecedented breakthrough in human genetics and biology spanned 13 years, starting in 1990 and ending in 2003 (International Human Genome Sequencing Consortium [4]). In our current era, after only two decades, we can realize the process of sequencing a complete genome in approximately five hours, at 1/1000 of the cost.
The achievement of such efficiency took, however, more than a century since Friedrich Mietscher first isolated DNA in 1869 [5]. Between 1871 and 1929, scientists including the same Friedrich Miescher, Walther Flemming, Albrecht Kossel, and Phoebus Levene established the fundamental principles of cell chemistry and nucleic molecules, which facilitated the development of DNA sequencing. By 1929, it was determined that nucleic acids consisted of five nitrogen-containing bases: adenine (A), cytosine (C), guanine (G), thymine (T), and uracil (U). The DNA backbone is composed of alternating sugar and phosphate groups, with each sugar molecule being connected to a nitrogen-containing base, which can be either A, T, C, or G [2]. At that time, nonetheless, DNA was seen as a structurally insignificant molecule; proteins, by contrast, were believed to constitute the “genetic” material, due to their link with chromosomes and their intricate structure. In 1944, Oswald Avery, Colin MacLeod, and Maclyn McCarty published a paper that provided evidence showing that DNA, rather than proteins, had the ability to alter the characteristics of cells [6]. Based on this new evidence, the ongoing DNA research rapidly grew. In 1952, Erwin Chargraff and Linus Pauling made significant discoveries regarding the molecular characteristics of DNA and the nature of the chemical links that connect its molecules [7]. Chargraff’s findings established that the quantity of guanine (G) should be equivalent to the quantity of cytosine (C) in any living organism, while the quantity of adenine (A) should be equal to the quantity of thymine (T).
In 1953, James Watson and Francis Crick determined the 3D structure of DNA (“
In 1965, Robert Holley and colleagues created the first complete nucleic acid of alanine tRNA from
Closely, Allan Maxam and Walter Gilbert published their chemical cleavage approach, which became the first widely used method for DNA sequencing for some time [15].
Originally, in Sanger method, the dideoxy terminator bases were labeled with 32P using radioactivity. This included carrying out four distinct processes. However, the current method involves using modified bases that are tagged with one of four fluorescent markers. Applied Biosystems then commercialized and mechanized the process by 1987.
The devices designed for executing this DNA sequencing technique would generate reads that are less than one kilobase (typically, 500–1000 bp) [16]. To investigate extended segments of DNA, researchers can adopt shotgun sequencing technology. This technique revolves around fragmenting the DNA into multiple pieces, sequencing each fragment, and subsequently utilizing a computer software created by Rodger Staden to accurately align the resulting collection of gel readings into a single continuous sequence, commonly referred to as a
Sanger’s initial approach had been soon mechanized. This resulted in replacing large slab gels with acrylic-finer capillaries that might be seen on an electropherogram. This technology was critical to the Human Genome Project’s completion in 2003. However, even after the Human Genome Project, the cost of capillary electrophoresis remained prohibitively expensive for large-scale sequencing programs. By the mid-2000s, there were various efforts seeking to reduce the costs of sequencing, and, across the globe, laboratories were putting novel methodologies and techniques for higher-throughput screening to the test.
The second generation of DNA sequencing arose after the introduction of pyrosequencers in 1996 [18]. The approach relied on sequencing-by-synthesis (SBS) technology, which involves the step-by-step incorporation and identification of individual nucleotides, quantifying the light produced during pyrophosphate synthesis. By detecting pyrophosphate generation as each nucleotide passed through the device, this method was utilized to infer sequences [18–20]. Pyrosequencing, introduced by Pal Nyrén’s group, had several advantages, including the use of natural nucleotides (rather than the heavily modified dNTPs employed in chain termination), and could be monitored in real time [18–20].
Pyrosequencing was licensed to 454 Life Sciences (later acquired by Roche), where it became the first major commercially viable NGS (Next Generation Sequencing) technique. Water-in-oil emulsion PCR was used to attach individual fragments (libraries) of DNA molecules to beads. Pyrosequencing can then take place by washing the plate with smaller bead-linked enzymes and dNTPs. In clonal amplification, primers bind to single-stranded DNA templates and a DNA polymerase triggers a response that produces many duplicates of the identical sequence. This parallelization (from which the term massive parallel sequencing as synonymous of NGS) significantly boosted the yield of sequencing efforts. During the sequencing reaction, nucleotides are introduced in a sequential manner, and the generated signal from each nucleotide is measured and categorized (Figure 2) [14]. This breakthrough led to the initiation of various genome sequencing programs, including the sequencing of James Watson’s genome, who is known for co-discovering the Watson–Crick structure of DNA, that was finished in 2007 [21]. Further, the Neanderthal Genome Project led to the successful sequence of the Neanderthal genome, published in 2009 [22].
Subsequently, numerous sequencing platforms, including Illumina (formerly Solexa), IonTorrent, and Pacific Biosciences, did appear. The detected signal varies depending on the equipment used and may involve the release of pyrophosphate, fluorescence emission, or a change in electric current.
Solexa introduced a sequencing technique in which a single stranded DNA library is created and then passed over a flow cell that contains complementary oligonucleotides to one of the two adaptor sequences on the DNA fragments [23–25]. The creation of dense clusters of amplified fragments is then enabled by a process known as “bridge amplification”, and the original DNA strands result in clusters of clonal populations. This process involves the replication of DNA arching over to initiate the next polymerization of surrounding surface-bound oligonucleotides. As sequencing-by-synthesis occurs, a fluorescent signal can be recognized each time a single dNTP is added consecutively. The number of clusters read grows over time.
Solexa performed sequencing on the bacteriophage 𝛷X174, which was previously sequenced by Sanger utilizing the Sanger sequencing technique [23–25]. A single run of Solexa SBS technology yielded more than 3 million bases. In 2006, Solexa introduced the Genome Analyzer, which further enhanced sequencing capabilities by enabling the sequencing of 1 gigabase (Gb) of data in a single run. Illumina bought Solexa in 2007, establishing itself as a frontrunner in short read sequencing technology, also known as fragment-based approaches (Figure 3) [14]. In parallel, other methods such as Ion Torrent, which monitors the pH difference during polymerisation, and SOLiD, which employs sequencing-by-ligation rather than synthesis (e.g., catalyzed with a polymerase), appeared, integrating the NGS landscape.
In general, the implementation of these next-generation sequencers greatly diminished the expense of sequencing, driving forward scientific investigation and industrial utilization of sequencing into a novel epoch. However, those methods all have a limitation in read length, as typically yielded reads ranging between 50 to 500 bp.
The third- and fourth-generation sequencing methods involve single-molecule technologies capable of producing reads of over 10,000 base pairs, known as long read technologies. These technologies offer substantial enhancements in addressing the difficulties associated with
| |
1972 | Sanger stated work on DNA sequencing |
1977 | Sanger developed Di_deoxy chain termination method of DNA sequencing |
1977 | Maxam and Gilbert developed chemical degradation method of DNA sequencing |
1977 | First DNA based genome sequenced (𝛷X174 bacteriophage) |
1995 | First bacterium Heamophilus influenzae was sequenced by shot gun method |
1996 | Applied Biosystems developed automated DNA sequencing based on Sanger’s method |
1996 | First eukaryotic genome (saccharomyces cerevisiae) was sequenced |
2001 | First human genome draft was published by two different independent teams |
| |
2005 | First NGS platform released Roche 454 GS-20 |
2006 | Introduction of second NGS platform – Solexa Genome Analyzer |
2006 | Initiation of 1000 genome project |
2007 | Introduction of Roche 454 GS-FLX & ABI-SOLiD sequencer |
2008 | Development of Illumina GA-II |
2009 | Introduction of Roche 454 GS-FLX Titanium |
2010 | Introduction of Roche 454 GS-Junior |
2011 | Introduction of SOLiD 550 W &Illumina MiniSeq |
2012 | Introduction of Illumina HiSeq |
2013 | Introduction of SOLiD 5500xl W&Illumina MiniSeq |
2014 | Introduction of Roche 454 GS-Junior+, Illumina NextSeq 500&Illumina HiSeq X Ten |
2017 | Introduction of Illumina iSeq 100 |
| |
2008 | Development of first commercial platform of third generation technology (Heliscope Biosciences) |
2010 | Ion Torrent released the Personal Genome Machine (PGM) |
2011 | Introduction of PacBIO RS C1/C2 |
2012 | Introduction of PacBIO RS C2XL & PacBIO RS II C2 XL, Ion Torrent released Ion Proton |
2013 | Introduction of PacBIO RS II C2 XL |
2014 | Introduction of PacBIO RS II P5 C3 & PacBIO RS II P6 C4 |
2015 | Introduction of Ion S5/S5XL 520/530/540 |
2016 | Introduction of PacBIO sequel |
| |
2014 | Release of MinION platform by Oxford Nanopore Technologies |
2017 | Release of ProMethION, GridION & SmidgION X5 platforms by Oxford Nanopore Technologies |
2018 | Commercialization of ProMethION platform by Oxford Nanopore Technologies |
The Oxford Nanopore sequencing platform (ONT), conversely, is considered a fourth-generation technology [29]. Its potential, however, was initially recognized even before second-generation sequencing, as electrophoresis had previously established that single-stranded RNA or DNA may be driven across a lipid bilayer via large-haemolysin ion channels. ONT uses a nanopore that is put into a membrane with electrical resistance. The current disruption caused by the passage of bases via the pore is monitored in real time to determine exact sequences of single molecules [30]. As far as DNA library preparation is concerned, in ONT sequencing, a hairpin structure is attached to the double stranded DNA, allowing the machine to read both strands in a single uninterrupted sequence. ONT sequencing can produce exceptionally long reads, exceeding 300 kilobases in length [31]. ONT later on created a lot of buzz with their nanopore platforms, which include GridION and MinION. The MinION, released in 2014, is a compact, mobile phone-sized USB gadget that has been used in the field, allowing rapid run times and small design also allow for decentralized sequencing. This method has greatly enhanced the process of identifying and screening for diseases such as the Ebola and Lassa viruses [32]. Nanopore sequencers, with future developments, could revolutionize not only the composition of data produced, but also where, when, and by whom it can be produced.
Since the first recognition of specific genetic lesions in human cancer in the seventies, DNA sequencing has become more important in molecular pathology for entities definition, prognostication, and prediction. The major example is represented by the current classification of hematopoietic and lymphoid neoplasms. Indeed, several entities among leukemias and lymphomas are diagnosed by identifying specific translocations or gene mutations [33, 34]. In addition, genetic testing is used for prognostication and patients’ selection for targeted therapies, as in the case of
As far as solid tumors are concerned, several new therapeutic options are offered based on the mutation profile, and so-called companion tests (i.e., genetic testing mandatory for prescribing certain drugs) have been developed. Among others, mutations affecting
The identification and characterization of microorganisms have evolved with the progress of various technologies and procedures. Until recently, bacteria were categorized using the DNA–DNA hybridization technique [36]. The process was arduous and severely constrained by the absence of a central database. The development of sequencing technology and the utilization of ribosomal genes, such as the 16S rRNA gene, have facilitated the establishment of a comprehensive database for precise microbiological identification. Due to the decreasing cost of sequencing and the availability of advanced algorithmic methods for analyzing microbiological databases, numerous organism-specific databases have been developed and utilized. These databases can even offer identification at the strain level [2].
It should be noted, however, that despite the adoption of NGS in most recent studies, Sanger sequencing still played a crucial role in supplying comparative data. Recent developments in NGS have enabled researchers to characterize and investigate species and organisms that cannot be cultivated under typical laboratory conditions. Using both short and long reads in whole genome sequencing, along with the quick sequencing technique offered by ONT, has had substantial effects on microbiology study and advancement. We are on the verge of applying these technical developments in an industrial environment, where scientific instruments are utilized to enhance our understanding of microbes and offer quick diagnostic and screening techniques. Accugenix® is eagerly anticipating the outcomes that will arise from the integration of these advanced methods in microbial identification.
Over the last 50 years, researchers from all over the world have worked hard to develop and improve the technologies that support DNA sequencing. Future perspectives, first, include a further drop in NGS costs making those technologies the new routine and instruments in medical diagnostic, agri-food industry, and scientific research. Furthermore, a parallel implementation of bioinformatic approach is needed. The tremendous amount of added information provided by NGS needs a refined bioinformatic deconvolution to become clinically useful. It is the case, for instance, of intron characterization deriving from whole genome sequencing, which currently still represents an unsolved issue. Nonetheless, it must be acknowledged that NGS will soon become an integral part of routine diagnostic for most if not all cancer patients and most infectious diseases, this contributing an essential step toward precision medicine.
The author declares no conflict of interest.
Written by
Article Type: Editorial
Date of acceptance: December 2023
Date of publication: December 2023
DOI: 10.5772/dmht.21
Copyright: The Author(s), Licensee IntechOpen, License: CC BY 4.0
© The Author(s) 2023. Licensee IntechOpen. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Impact of this article
477
Downloads
235
Views
1
Altmetric Score
Join us today!
Submit your Article