The diseases caused by genetic variations focus on the protein coding regions. However, there is a large non-coding part of the human genome, which is functional and can be related to some diseases.
Some genetic variants occur during splicing, which is the process of mRNA formation, in which introns (non-coding region) are removed and exons (coding region) are aligned and linked to form mRNA (mature RNA that will be exported to the cytoplasm and translated into protein). In this process, some abnormalities, such as skipping exons or retaining introns are responsible for up to 15% of all inherited diseases.
The non-coding regions of the human genome are present in Topological Associated Domains (TADs), which are domains of the chromosome that have high interaction. Therefore, the structural rearrangement that alters this interaction, results in an important impact on gene expression. In the figure below, it is possible to observe the regions of DNA present in these domains.
Image source: FRENCH, J. D (2020).
5’UTR and 3’UTR
The UTR regions are non-coding regions of mRNA (messenger RNA). Flanking sequences located at the beginning and at the end of the coding sequence. The UTR sequence is transcribed in mRNA but is not translated into protein, presenting post-transcriptional roles such as mRNA stability, aid in secondary structure and translation. Together with the introns they represent about 35% of the human genome and modifications can cause impacts resulting in an altered mRNA or altered protein levels.
Along with enhancers, cis-regulatory elements are considered, which are spread in non-coding regions regulating genetic transcription with the help of transcription factors (TFs). Promoters are regions of DNA that recruit TFs and RNA polymerase to initiate the transcription process of a particular gene. Few Mendelian diseases have been identified with pathogenic variants in promoters, but some changes have been identified in cancer predisposing genes.
Sequences that regulate promoters up to 1Mb apart that can bind to proteins that alter levels of gene transcription. Some studies of variants in this region show the alteration in the binding of FTs resulting in altered expression of the target gene. Furthermore, as the enhancer can interact with more than one gene promoter, enhancer variants can impact the expression of more than one gene.
Non-coding RNA (ncRNA)
The RNA molecules that are not translated into protein are ncRNA. Changes in these molecules have been reported in studies showing that the expression change is involved in diseases. NcRNAs are still being identified and their diverse functions make studies difficult.
NcRNAs can be:
- miRNA: microRNAs with a length between 21 and 25 nucleotides that act as post-transcriptional silencers, regulating translation.
- sRNA: small non-coding RNAs with less than 200 nucleotides that act as RNA silencers.
- lncRNA: long non-coding RNAs with more than 200 nucleotides. They are involved in various cellular processes such as transcription, chromatin organization and RNA processing.
Tandem repeats are a region with copies of small DNA sequences positioned one after the other. Changes in these regions can have effects such as transcription modulation and cell architecture. Currently, it is possible to identify changes in tandem repetitions in several Mendelian diseases, such as: fragile X syndrome, Huntington’s disease and spinocerebellar ataxia. It is important to note that the difficulty of sequencing limits the detection of variants within tandem repetitions and the progress of sequencing can overcome this type of problem.
Challenges and progress
Many functional elements are specific to each type of tissue and cell. Therefore, testing the correct cell types that are related to the phenotype is essential for this type of study. For a large proportion of variants, it is likely that many characteristics are mediated by several cell types.
For many types of cancers, the immune system have a key role in tumor development. In this case, it is possible that some variants of cancer risk act specifically on the types of immune cells, and not on the type of cell from which the tumor was derived.
Non-coding regions have been ignored for many years, and other elements are likely to be discovered. In addition, the impact of genetic variants has yet to be tested in vivo, which offers new challenges. As non-coding elements are generally not well maintained, it is often not possible to model organisms. Thus, the functional studies are often limited to models derived from humans, such as cultured cells and organoid models.
CRISPR-based technology can provide answers to overcome some technical challenges. This technology has successfully identified functional elements and variants and the editing platforms are already being designed to modify the RNA transcripts.
Emerging methods may help to discover the role of disease-associated variants in selected cells. The translation of this information to the clinic is the objective of these studies and with the reduction of the cost of sequencing, the WGS becomes the next clinical option, requiring resources for the interpretation of this information.
Variations in non-coding regions can also be important for redirecting new or existing drug therapies. Several sequencing projects, such as the 100,000 genomes project, fuel the discovery of variants, facilitating the more accurate cataloging of the genetic variants of the human genome. The increase in studies and the link between variants and phenotypes will transform the ability to understand mechanisms, diagnoses, discoveries and better treatments for rare and common diseases.
References: CHATTERJEE, Sumantra; AHITUV, Nadav. Gene regulatory elements, major drivers of human disease. Annual review of genomics and human genetics, v. 18, p. 45-63, 2017.  FRENCH, J. D.; EDWARDS, S. L. The Role of Noncoding Variants in Heritable Disease. Trends in Genetics, 2020.  HANNAN, Anthony J. Tandem repeats mediating genetic plasticity in health and disease. Nature Reviews Genetics, v. 19, n. 5, p. 286, 2018.  KARNUTA, Jaret M.; SCACHERI, Peter C. Enhancers: bridging the gap between gene control and human disease. Human molecular genetics, v. 27, n. R2, p. R219-R227, 2018.  LAPPALAINEN, Tuuli et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature, v. 501, n. 7468, p. 506-511, 2013.  SAKHARKAR, Meena Kishore; CHOW, Vincent TK; KANGUEANE, Pandjassarame. Distributions of exons and introns in the human genome. In silico biology, v. 4, n. 4, p. 387-393, 2004.  SAKHARKAR, Meena Kishore; CHOW, Vincent TK; KANGUEANE, Pandjassarame. Distributions of exons and introns in the human genome. In silico biology, v. 4, n. 4, p. 387-393, 2004.  STERI, Maristella et al. Genetic variants in mRNA untranslated regions. Wiley Interdisciplinary Reviews: RNA, v. 9, n. 4, p. e1474, 2018.  TURNBULL, Clare et al. The 100 000 Genomes Project: bringing whole genome sequencing to the NHS. Bmj, v. 361, 2018.