The Human Genome Project and the constant advances In genetic testing and sequencing technologies led to the emergence of what we call the “Post Genomic Era”. Today it is possible to sequence all or some parts of the DNA quickly and at an affordable price through Next-Generation Sequencing .
Thus, this technique has enabled important achievements in the diagnosis of pathologies and in precision medicine. Understand all about Next Generation Sequencing below!
DNA sequencing is the exact determination of the order in which nucleotides are found. DNA sequencing is based on equipment that reads a DNA sample and generates an electronic file. This file contains symbols that represent the sequence of nitrogenous bases – A, C, G, T – present in the sample.
First Generation Sequencing
The first DNA sequencing method was developed by Maxam-Gilbert, however, it was only capable of sequencing a few nucleotides at a time. Then Frederik Sanger developed the termination chain sequencing method, published in 1977, which became popular and has been used by research centers around the world until today.
Sanger sequencing method is considered as “first generation” technology and it is useful and effective in sequencing DNA fragments of 500-900 base pairs.
Therefore, Sanger sequencing is widely used to sequence organisms that have short DNA sequences such as bacterial plasmids, or also DNA fragments amplified by PCR ( polymerase chain reaction ).
In 1986, the first automatic DNA sequencer, the ABI 370, was launched, and in 1998, the first capillary electrophoresis sequencer, the ABI 3700.
With automation, it was possible to carry out large sequencing projects, such as the complete Human Genome Project , however, at the cost of years of research, billions of dollars and international effort by large centers with dozens of machines installed.
Despite the success in sequencing the complete human genome, Sanger sequencing is an expensive and inefficient method for large-scale projects. For tasks like this, the Next-Generation Sequencing techniques are more efficient and cost less.
Next-Generation Sequencing (NGS)
NGS stands for Next-Generation Sequencing , also called Massively Parallel Sequencing or High-throughput Sequencing . Different terms referring to a group of different modern DNA sequencing methodologies.
NGS can be characterized as automated, parallel, high-throughput DNA and RNA sequencing.
NGS can be characterized as automated, parallel, high-throughput DNA and RNA sequencing.
Next-generation Sequencing make it possible to massivily sequence DNA much quicker and cheaper compared to Sanger sequencing, which was previously used, and have revolutionized the study of genomics and molecular biology.
However, unlike the Sanger method, the NGS method sequences short DNA fragments ( reads ) that typically range from 50 to 300 nucleotides in length.
Although they differ considerably from each other, all NGS sequencing platforms are based on massively parallel processing of DNA fragments.
That is, while a capillary electrophoresis sequencer processes a maximum of 96 fragments at a time, next-generation sequencers can read up to billions of fragments at the same time .
With NGS, it is possible to sequence the entire genome or just specific areas of interest, including all of the approximately 20,000 coding genes or a small number of individual genes.
What are the main Next-Generation Sequencing (NGS) platforms?
NGS platforms are the equipment where the sequencing takes place. There are different platforms available on the market and they differ in technology and method used for bulk sequencing. The main NGS platforms are:
A pioneering platform in NGS, it was launched in 2005. This platform uses the Pyrosequencing method which is based on the detection of pyrophosphate released during the incorporation of a nucleotide in the newly synthesized DNA strand.
The Platform can generate reads with lengths of up to 1000 base pairs for genomic DNA and up to 600 bp for amplicons and can produce about 1 million reads per run.
This platform uses a methodology similar to that of Roche, however the addition of nucleotides is detected through the generation of a hydrogen ion during the incorporation of a dNTP.
The release of H+ ions during the process changes the pH in the environment, forming a high positive voltage that is detected by a device. This method is capable of sequencing reads from
This is the most popular next-generation sequencing platform as it is capable of performing large sequencing with high-quality reads at a more affordable price.
Illumina platform uses the sequencing by synthesis method. The process simultaneously identifies nucleotides while incorporating them into a nucleic acid chain.
Basically, chemically modified nucleotides bind to the DNA template strand by complementarity. These nucleotides have a fluorescent tag and a reversible terminator that blocks the incorporation of a next base.
The fluorescent signal indicates which nucleotide has been added and the terminator is cleaved so that the next base can be attached and thus complete the sequencing of t
This platform uses the ligation sequencing method which uses the mismatch sensitivity of DNA ligase to determine the sequence of nucleotides in a fragment.
The system uses four dyes to identify all 16 possible combinations between two nucleotides and several modified probes. At one end of the probe is the combination of two nucleotides and at the opposite end is the fluorophore.
When the probe annexes the DNA sequence, the system identifies the light signal and consequently the sequence of nucleotides.
Also known as Third Generation Sequencing . These more modern platforms are capable of overcoming problems found in other sequencing methods, such as short reads reading errors or even bench work for sample preparation.
The basic principle is to measure changes in the electrical properties of DNA as it translocates through channels. When the fragment crosses the channel, the platform identifies the exact nucleotide base
This method detects a single DNA molecule per reaction. The basic principle is to measure changes in the electrical properties of DNA as it translocates through channels. When the fragment crosses the channel, the platform identifies the exact nucleotide base
What are the types of DNA sequencing?
There are three main types of DNA sequencing done by NGS: Whole Genome Sequencing, Whole Exome, and Targeted Panels. Choosing between the type of NGS that may be required for research or clinical diagnosis depends on what you are looking for.
These differences are not necessarily based on the platform used to perform the sequencing, but rather on the sequences resulting from the sequencing . Therefore, it is necessary to know what to look for at the end of the sequencing before requesting the type of NGS.
In the following topics, we will explain what each type of NGS is and what each one is indicated for.
Whole Genome Sequencing (WGS)
Whole Genome Sequencing (WGS) is the sequencing of all genetic material or DNA from an individual. This sequencing will include coding and non-coding DNA
The WGS results in a very large file with a multitude of information about the genetic material that may or may not have a clinical interpretation.
However, in most cases, only alterations in the coding portion of the DNA, which represents approximately 2% of the entire genome, do not allow clinical interpretation. That is, changes that occur in non-coding regions often do not provide information about their possible consequences for the individual.
In addition, WGS has a higher cost both for its processing and for storing the result.
Therefore, the sequencing of the complete genome is more indicated in scientific studies aiming to study the consequences of alterations in regions not yet studied or the search for alterations that cause diseases still without an elucidated cause, for example.
Whole exome sequencing (WES)
Whole Exome Sequencing ( WES) is the sequencing of the coding portion of DNA, that is, all the DNA sequences that encode the approximately 21,000 genes of the human genome.
The term exome refers to the set of exons , which are the DNA sequences that will go through the steps of transcription and translation until they are encoded into proteins.
Changes in genes can cause changes in protein structure and therefore are more likely to cause disease. Exome mutations are responsible for the vast majority of genetic diseases.
Thus, the WES is an efficient laboratory test and is more suitable for clinical diagnosis , especially in cases where there is a diagnostic hypothesis of more than one specific disease, or even when there is no specific diagnostic hypothesis.
The result of a WES is a file with the mutations found in all of the individual’s genes, which requires a longer analysis time.
Target gene sequencing or gene panel
Target gene sequencing or gene panel
Target gene sequencing or gene panel sequencing is the Next-Generation Sequencing performed on just a group of genes of interest.
This type of sequencing is similar to WES, however, instead of sequencing all of the individual’s genes, sequencing is performed on just a number of specific genes: already associated with one or more diseases, for example.
Therefore, when there is a diagnostic hypothesis of a specific disease, the gene panel is recommended more than WES in the clinical diagnosis.
Currently, laboratories that carry out Next-Generation Sequencing provide panels of genes associated with various genetic diseases, such as breast cancer and other cancers, skeletal and muscular diseases or leukodystrophies, for example.
The advantages of gene panel sequencing over WES are the cost savings of NGS, as fewer genes will be sequenced, and the reduction in analysis time.
Learn more: Clinical applications of NGS
How are Next-Generation Sequencing (NGS) results analysed?
Regardless of the platform used and the type of sequencing, the result of an NGS is a file containing the reading of up to billions of DNA fragments.
These files need to be processed to be able to carry out the analyzes of the genetic material. For this, bioinformatics tools are used to define the exact position of each DNA nucleotide in the genome, and from it it is possible to analyze the changes in the individual’s DNA sequences.
This is where bioinformatics comes in, which uses computer techniques and scientific data analysis. That is, tools that organize and analyze complex genetic information using a combination of computer, mathematical and statistical tools.
“ Bioinformatics algorithms run through a predetermined sequence to process Next-Generation Sequencing data are collectively referred to as a bioinformatics pipeline. A bioinformatics pipeline guides and progressively processes NGS data through a series of data conversions, utilizing various software components, databases and operating environments.”
Therefore, data received from sequencing is processed and analyzed to verify the quality of the archive and then aligned with a reference genome. That is, a known nucleotide sequence that will serve as a guide for assembling the sequenced genome.
After having organized all the fragments into sequences aligned with the reference sequence, the genomic regions can be analyzed in search of alterations known as variants.
What are genetic variants?
Genetic variants are regions of genetic material that differ from the reference genome. These variants, also called mutations, can occur throughout the genome. That is, they can occur in coding and non-coding regions.
The alterations can be of the germline type , that is, genetic alterations that have been inherited and that, therefore, affect all the cells of an individual, or somatic variants , that is, mutations that occurred after development, at some point in life, and are present in one or more specific tissues.
The most frequently occurring mutations in the human genome are single nucleotide polymorphisms, better known as SNPs ( Single Nucleotide Polymorphism ). As the name implies, this type of alteration occurs with the exchange of a pair of nucleotide bases in the DNA chain.
However, especially when mutation occurs in a gene, the SNP can cause changes in the protein synthesis code and consequently cause pathologies.
Another source of genomic variation is known as copy number variation or CNV . Which are genomic fragments greater than 50 pairs of bases that differ in number of copies.
CNVs are characterized by duplications, also known as gain or increase in genomic content, or deletions, also known as loss or decrease in genomic content. Thus, as CNVs involve a larger portion of the DNA, the variation may imply relevant clinical manifestations depending on the region involved.
In this context, there are several online platforms that allow identifying variants and estimating the possible damage that a specific alteration can cause to the protein it encodes.
Each platform makes this prediction through specific metrics and thus classify whether the variation is more likely to have caused no change in the protein (i.e., it is a benign mutation) or to have caused a mild or severe change (possibly pathogenic or pathogenic mutation). .
What are the applications of Next-Generation Sequencing (NGS)?
The arrival of NGS technologies on the market has changed the way we think about scientific approaches in basic, applied and clinical research and in genetic diagnosis. Find out what are the main applications of Next Generation Sequencing below.
Next-Generation Sequencing (NGS) in clinical diagnosis
Many genetic disorders take time to present clinical manifestations and, therefore, to be diagnosed. In this scenario, an early diagnosis of disorders may be essential to prevent or delay the development of symptoms and thus significantly improve quality of life.
In this scenario, the NGS is highly efficient in the molecular diagnosis of genetic diseases and hereditary or non-hereditary cancers. Through sequencing, genetic testing seeks to identify variants of genes known to be associated with predisposition to the development of disorders.
An example of this is breast cancer which is associated with variants of the BRCA1, BRCA2 and PALB2 genes. The chance of curing breast cancer when identified early is 95%, which demonstrates the importance of genetic diagnosis.
In addition to cancer diagnosis, sequencing is also recommended for the diagnosis of congenital diseases and in case of other hereditary disorders. However, as there are different types of Next-Generation Sequencing, it is necessary to seek a genetic counseling professional to indicate the best referral.
The identification of variants in genetic tests is based on genomic databases, as previously mentioned. As the data are constantly updated with the development of the research, in some cases it is necessary to perform a reassessment of the exams.
GWAS: genomic wide association studies
The sequencing of the genetic material of several organisms allowed the identification of several coding regions and their role in the characteristics of these organisms.
Genome Wide Association Studies (GWAS ) perform genome-wide sequencing of groups of organisms to identify associations between genomic regions and phenotypic traits.
Individuals are selected according to the trait of interest, then the DNA of individuals who have the trait of interest is compared to individuals who do not, called controls, in search of variants.
The GWAS is responsible for identifying several disorders of clinical importance, in addition to tracing genetic similarities between populations.
Learn more: GWAS: Genome Wide Association Studies
Next-Generation Sequencing (NGS) in Metagenomics
The use of sequencing goes far beyond human studies. Metagenomics, for example, makes it possible to identify microorganisms that have high biotechnological potential, as well as to study emerging pathologies and their transmission routes.
Metagenomics is the study of the diversity, taxonomy and functional potential of a microbial community coexisting in an environment. In other words, this study is based on the analysis of the microbial composition in different ecosystems, through the extraction of the total DNA of a sample.
Previously, the identification of new microorganisms was carried out on a bench through isolation and cultivation in a culture medium. With next-generation sequencing, it became possible to sequence billions of DNA fragments simultaneously, thus allowing to obtain all the genetic information of several microorganisms at once.
The metagenomics can be used in the most diverse applications. As in the analysis of the human gut microbiome to carry out disease association studies, as well as the study of the soil to identify microorganisms that favor plant growth.
NGS for viruses
Virus treatments are often generalized since the identification of the virus causing the infection requires several tests.With this, a strategy is gaining prominence in the field of medical diagnostics: the viral genome sequencing, called Virome.
In this scenario, the Virome makes it possible to diagnose with greater precision and in a single test the specific virus causing the infection. This is due to the possibility of sequencing thousands of virus fragments at once through next generation sequencing and thus identifying what is present in the sample.
The Virome can also be used to identify new virus variants, as is the case with the new coronavirus.
Finally, the constant evolution of genetic sequencing technologies allows for increasingly in-depth studies of the genetic material of the most diverse organisms. Thus, it will be possible to progressively improve the quality of life through precision medicine, obtain new biotechnological products and develop cultures with greater productivity.
The history of genomic sequencing is just beginning.