The results of scientific research generate data, and when we consider the evolution of technology and science itself, it is difficult to imagine the enormous amount of data generated daily. Moreover, analyzing and interpreting the results is often the most challenging part of studies when we also consider the complexity of the data obtained. In this scenario, bioinformatics emerged, the union between computational technology and science.
What is bioinformatics?
Bioinformatics is a multidisciplinary science that uses technology and computational tools in genetics to organize, store, interpret, visualize, and disseminate biological data.
Most of the time, bioinformatics is used to study gene and protein function determination, establish evolutionary relationships, and predict the three-dimensional structures of proteins.
When did bioinformatics emerge?
The term bioinformatics was first used by Paulien Hogeweg and Ben Hesper in 1970 as “The study of informatics processes in biotic systems.” However, researcher Margaret Dayhoff is known as the mother of bioinformatics.
She was a pioneer in developing computers capable of determining peptide sequences, programs to display three-dimensional protein structures, and computational methods for sequence comparison.
In 1965, Dr. Dayhoff participated in the creation of the “Atlas of Protein Sequence and Structure,” the first computerized public database of protein sequences.
In addition to these and other researchers who made significant contributions to the recognition of bioinformatics, its growth was only possible through the development of computers.
The advancement of computer technology resulted in increased processing power and memory capacity, which are essential for analyzing the vast amount of data generated in research.
Science and technology side by side
The development of science and computers followed a certain synchrony of evolution. In 1970, Fred Sanger and Walter Gilbert developed the first DNA sequencing methods.
Then, in 1977, Sanger used this system to deduce the DNA sequence of the bacteriophage Phi X 174 (ΦX174), the first complete genome to be sequenced with about 5,000 nucleotides. At that time, computers still had limited storage capacity.
In the late 1980s, the first automated DNA sequencing machine emerged. Shortly thereafter, in the 1990s, computers became more popular, with increased storage and processing capacity, as well as internet access.
During this time, the first algorithms for genome assembly, i.e., defining the sequence of nucleotides in the genetic material, began to be developed. The first living organism to have its genome sequenced was the bacterium Haemophilus influenzae, in a work published in 1995.
During this time, bioinformatics became even more well-known, and there was an increase in investment to develop new algorithms capable of analyzing more complex and voluminous sequences.
With the advancement of science, technology, and data accessibility, it became possible to initiate the Human Genome Project in 1990. Researchers from 18 countries collaborated to map the human genome, which has about 3 billion base pairs.
Around 2005, the most modern DNA sequencing technology, Next-Generation Sequencing (NGS), emerged, revolutionizing science. This technology allows sequencing billions of DNA fragments at once. In addition to its speed, NGS also made sequencing much less financially costly and, therefore, more accessible.
As a result, the list of complete genomes and the volume of biological data began to increase dramatically. Therefore, it was also necessary to develop new bioinformatics tools to quickly and effectively process the enormous volumes of generated data.
What are the main applications of bioinformatics?
In the view described by Luscombe and colleagues in 2001, bioinformatics is highlighted by the following objectives:
I- Organize data so that researchers can access information and create new information:
Due to the large volume of generated data, organization, storage, and access have become necessary. Therefore, the increase in data quantity has been accompanied by the emergence of various genetic databases.
One of the first platforms developed and currently one of the main ones used was GenBank, developed by the National Institutes of Health (NIH) in 1982. The database hosts the majority of nucleic acid and protein sequence data from thousands of organisms.
In addition to GenBank, the DNA Databank of Japan (DDBJ) and the European Molecular Biology Laboratory (EMBL) are considered primary databases. Today, according to Nucleic Acids Research, there are about 1700 biological databases available.
II- Develop tools and resources that assist in data analysis:
The construction of tools for data processing is one of the most important objectives of bioinformatics. As scientific knowledge advances, it is often necessary to develop new computational resources or tools to perform analyses of new discoveries.
An example of this is the study of Systems Biology (or Systems Biology), which conducts a comprehensive quantitative analysis of how the components of a biological system interact functionally over time.
Another very common example of analysis is the construction of phylogenetic trees, a statistical study that allows the analysis of evolutionary relationships between different species and organisms.
III- Use these tools to analyze and interpret the data in a meaningful way:
Bioinformatics tools can be used, for example, to:
- Process NGS data through a series of data conversions;
- Genome assembly;
- Sequence alignment;
- Comparison of DNA fragments with a reference genome;
- Performance of modeling assays.
NGS and Bioinformatics
The development of NGS technologies associated with bioinformatics has opened up a range of new possibilities, such as understanding:
- The role of genetic variations;
- Evolutionary processes;
- Functional mechanisms of organisms;
- Global gene expression studies;
- Patterns of methylation;
- Epigenetic markers and others.
The relationship between these two tools has also enabled the beginning of a new era for human health: precision medicine. And although the union between science and computational technology is still recent, the advances in knowledge are enormous, and much more is yet to come!
Angarica VE, Del Sol A. Bioinformatics Tools for Genome-Wide Epigenetic Research. Adv Exp Med Biol. 2017;978:489-512. doi:10.1007/978-3-319-53889-1_25
Diniz WJ, Canduri F. REVIEW-ARTICLE Bioinformatics: an overview and its applications. Genet Mol Res. 2017;16(1):10.4238/gmr16019645. Published 2017 Mar 15. doi:10.4238/gmr16019645
Mulder NJ, Adebiyi E, Adebiyi M, et al. Development of Bioinformatics Infrastructure for Genomics Research. Glob Heart. 2017;12(2):91-98. doi:10.1016/j.gheart.2017.01.005
Oliver GR, Hart SN, Klee EW. Bioinformatics for clinical next generation sequencing. Clin Chem. 2015;61(1):124-135. doi:10.1373/clinchem.2014.224360