Next-Generation Sequencing and Personalised Medicine

Next-Generation Sequencing and Personalised Medicine

by Hoi Kiu Wong


Creator: ktsimage | Credit: Getty Images/iStockphoto 

Copyright: ktsimage


Introduction

DNA Sequencing has revolutionised Genomic Medicine, and has offered new and potential treatments to cancer patients, as well as towards other genetic diseases. With continuous development in the field of Bioinformatics, accuracy and precision in DNA Sequencing has improved drastically; from Sanger Sequencing to all kinds of Next-Generation Sequencing technologies with machines like Illumina and SOLiD System. As we have looked into what Sanger Sequencing is in the previous article, we will now outline what Next-Generation Sequencing is, and how it will play a pivotal role in Medicine. 


Next-Generation Sequencing 


Sanger Sequencing changed genomic medicine as it was the first commercialised method of DNA Sequencing. It has led to the success of the Human Genome Project in 2003 where the entire human genome was analysed [1]; however, it took researchers and scientists approximately 13 years to complete this task [2]. Not only does Sanger Sequencing take a long time to sequence DNA, there are also other limitations. Sanger Sequencing can only sequence short pieces of DNA (300 to 1000 base pairs) and the quality is poor for the first 15 to 40 bases as that is the binding site for the primer. Moreover, the sequence quality degrades after 700 to 900 bases. There is also a possibility that if the DNA fragment being sequenced has been cloned, the cloning vector sequence may move to the final sequence, thus creating a duplication in the electropherogram [5]. 


With continuous advancements in bioinformatics and DNA sequencing, Next-generation sequencing began to emerge. The NGS method utilises array-based sequencing which processes millions of sequencing reactions in parallel (millions of them happening simultaneously), which means that the DNA sequence can be analysed at a very high speed and at a reduced cost [2]. The first NGS platform that emerged was Pyrosequencing, which in turn has led to the birth of other NGS platforms. These NGS platforms use different technical details (especially in the sequencing step), but they do share some common features in some steps. These steps include the preparation of the sample (Library preparation), Amplification of sequences and Data Output [2] [4]. 


Library Preparation 


Library Preparation is an essential step in NGS as the DNA strand needs to be fragmented first before being analysed. The strand is fragmented using enzymes or by sanitation (excitation using ultrasound) into smaller ones. Then, DNA ligase help stick (‘ligate’) smaller strands to adaptors (short, double-stranded pieces of synthetic DNA). The adaptors need to be attached to the DNA fragment as they allow the fragment to bind with a complementary counterpart [2]. 


In order for the adaptor to stick to the fragment, it has one ‘sticky’ end and one ‘blunt’ end, and this ‘blunt’ end attaches to the ‘blunt’ end of the DNA fragment. However, this could cause a problem in which base pairing between two fragmented ends can occur, resulting in ligation and the formation of a dimer. Therefore, in order to prevent this from occurring, since ligation takes place between the 3’-OH and 5’P ends - removing the phosphate from the sticky end of the adaptor and replacing it with a 5’-OH end would prevent dimerisation (DNA ligase cannot link the two termini of two DNA fragments) (See Fig. 1) [2].


Fig. 1 Library Preparation, note that in the third and second-row diagrams, it shows how dimerisation between two DNA fragments can be prevented. 

Source: https://www.atdbio.com/content/58/Next-generation-sequencing [2]


Once the adaptors are attached to the DNA fragments, they are spatially clustered in PCR (Polymerase Chain Reaction) Colonies, which are also known as ‘polonies’ [2] (If not sure about PCR, please read the section ‘PCR’ in my previous article “Sanger Sequencing”). The polonies are attached planarly, and the array can be manipulated enzymatically in parallel. This means that constructing the library is much faster than the original procedure of colony picking and E. Coli cloning used to isolate and amplify DNA in Sanger Sequencing. However, these polonies limit the read length of fragments using NGS [2]. 


Amplification


Library amplification is required in order for the received signal from the sequencer is strong enough (must pass a threshold) to be detected accurately. There are a variety of amplification techniques that use PCR to create many DNA clusters in NGS, and we will look into two of them: Emulsion PCR and Bridge PCR [2]. 


Emulsion PCR 


Emulsion PCR (ePCR) allows simultaneous amplification of each DNA sequence without the risk of contamination. This type of PCR requires emulsion oil, water, PCR mix, library DNA (fragments with adaptors ligased onto them) and beads. Each microcell that forms needs to contain one bead with one strand of DNA [2] - each bead acts like a ‘microreactor’ for PCR [3]. Note that the adaptors attached to the DNA fragments allow them to bind to the emulsion beads [3]. 


Fig. 2 Emulsion PCR 

Source: https://www.goldbio.com/articles/article/Types-of-PCR-used-for-Genetic-Research


For amplification to occur, PCR denatures the library fragment which means that the DNA strands are separated into two (one of them, the reverse strand, anneals to the bead). The Annealed DNA is amplified by polymerase which starts from the bead towards the primer site - this means that the strand released from the bead after denaturation will re-anneal to the bead, leading to two separate strands attached to the bead. Both of them are then amplified and this amplification process repeats for approximately 30-60 cycles, which leads to clusters of DNA forming (See Fig. 2) [2]. 


Bridge PCR 


Bridge PCR is another method used for amplification of the DNA fragments. The surface of the flow cell used in Bridge PCR is densely coated with primers; these primers are complementary to the primers attached to the DNA library fragments, which allows the DNA fragment to attach to the surface of the flow cell. On the flow cell, there are reagents for polymerase based extension, which then triggers the next step where the free ends of the DNA strands attach to the complementary primers on the flow cell, thus creating bridged structures. Enzymes and nucleotides then interact with the bridged structures so that double stands are formed. Afterwards, these two strands are then denatured and the whole process repeats. This results in ‘clonal clusters of localised identical structures’. During Bridge PCR, it is important to monitor the reaction to prevent overcrowding of the DNA strands on the flow cell, which could lead to errors [2].


Bridging PCR

Fig. 3 Bridge PCR 

Source: https://www.atdbio.com/content/58/Next-generation-sequencing [2]


Sequencing 


Now, we will delve into each of the different NGS techniques (Pyrosequencing, Reversible terminator Sequencing, Sequencing by Ligation and Ion semiconductor sequencing) and compare them to Sanger Sequencing [4]. 


Pyrosequencing 


Pyrosequencing is a method of DNA sequencing that was first discovered in 2003 [7]. It is different from Sanger sequencing because light signals are detected when a dNTP incorporates into the growing DNA strand. Light is emitted as a result of a series of chemical reactions that the released pyrophosphate undergoes. The inorganic pyrophosphate (PPi) is used in a sulfurylase reaction (reacts with APS) which then releases adenosine triphosphate (ATP) that is subsequently used by luciferase (the enzyme) to produce light, during which the luciferin is being oxidised [6] [7]. The light emission is detected by a photodiode (alternatively, photomultiplier tube or charge-coupled device camera [8]) and the data is sent to the computer which records down the appropriate sequence of the cluster. The sequencing then continues by incubating one base at a time, and then measuring the light emission, then removing the unincorporated bases (known as the ‘washing step’ [8]) and then the addition of another base [4] (See Fig. 4). 


Pyrosequencing is a good method of DNA sequencing as it can do large read lengths [4] - this means that it can generate large amounts of sequence data. Moreover, it is cheap compared to other NGS platforms and it is relatively quick, it can analyse up to 96 bisulfite-converted DNA samples in approximately four hours [7] [8]; however, pyrosequencing has a high error rate when analyzing more than six homopolymer strings and the reagent cost is high as well [4]. Therefore, it is only used in the validation step for DNA methylation biomarkers, and is not recommended for discovery of new biomarkers [7]. 


Fig. 4 Pyrosequencing 

Source: Wikimedia Commons 

https://commons.wikimedia.org/wiki/File:How_Pyrosequencing_Works.svg


Reversible Terminator Sequencing 


Reversible Terminator Sequencing is similar to Sanger Sequencing as it uses the incorporation of ddNTPs. Akin to Sanger Sequencing, all four nucleotides (ddNTPs) are added and after each nucleotide is incorporated to the growing DNA strand, the remaining DNA bases are washed away (the ‘washing step’) and the fluorescent signal is read at each cluster and recorded [4]. However, the two methods are different because instead of terminating the primer extension irreversibly using ddNTP (preventing chain elongation), modified nucleotides are used in reversible termination sequencing. This means that the fluorescent molecule of the modified nucleotide (terminator group) is then cleaved and washed away [4], and another modified nucleotide is added to the growing strand. This process repeats until the sequencing reaction is complete [4]. It is important to note that reversible termination uses bridge PCR instead of ePCR that many other sequencing techniques use, which improves efficiency (See Fig. 5). Reversible Terminator Sequencing is used in Illumina [2]. 



Fig. 5 Reversible Terminator Sequencing 

Source: https://www.atdbio.com/content/58/Next-generation-sequencing [2]


There are two categories of reversible terminators: 3’-O-blocked reversible terminators and 3’ unblocked reversible terminators [2]. We will not delve more into reversible terminators as it is beyond the scope of this article but if you are interested in learning more about them, please refer to the further reading list at the end of the article. 


As time-efficient reversible terminator sequencing is, there are limitations and disadvantages to this method. As the sequencing continues, the error rate increases due to incomplete removal of the fluorescent signal which leads to higher background noise levels. This results in an increased error rate as the read length increases[4]. 


Sequencing by Ligation 


Sequencing by Ligation is a sequencing method that is used by SOLiD. It uses ePCR to amplify the DNA fragment of interest and ssDNA primer-binding regions (the adapters) are conjugated to the bead. These beads are then deposited onto a glass surface - resulting in a high density of beads that increases the throughput of the technique [2]. The sequencing reaction starts by a primer of length N binding to the adapter and then the hybridisation of the appropriate probe [4] - in which these beads are then exposed to a library of 8-mer probes (also known as 8-mer oligonucleotide probes); each of these probes have one of four fluorescent dyes attached to its 5’ end, and they all have a hydroxyl group at the 3’ end [4]. Each 8-mer probe consists of two probe specific bases (so they are complementary to the nucleotides to be sequenced) at the start, followed by three degenerate bases and three inosine bases. Only the probe complementary to the target sequence will be hybridised, and this happens adjacent to the primer. There is a phosphorothioate linkage between the bases 5 and 6, and this linkage allows fluorescent dye to be cleaved from the fragment using silver ions [2]. Cleavage is important as it enables fluorescence of the specific wavelength to be measured (specific colour - there are four different fluorescent dyes used, all of which have different emission spectra). 


Once the 8-mer probe is hybridised and the unbound oligonucleotides are washed away, the fluorescent light signal is being detected and recorded; subsequently, this signal along with the last three bases of the 8-mer probe are cleaved, then the cycle repeats [4], but this time performed with a primer of length N-1. For the rounds after that, shorter primers are used (one less of the one previous to it i.e. N-2, N-3 lengthed primers are now used) [2]. This process continues until the DNA fragment is completely read. Overall, five sequencing primers are used (See Fig. 6) [4]. 




Fig. 6 Sequencing by Ligation 

Source: Voelkerding KV, Dames SA, Durtschi JD. Next-generation sequencing: from basic research to diagnostics. Clin Chem. 2009;55(4):641-658. doi:10.1373/clinchem.2008.112789 [9]


Sequencing by Ligation is highly accurate compared to other sequencing techniques as it uses the two-base sequencing method (which means that each base is effectively sequenced twice). This is the reason why the SOLiD technique is 99.999% accurate with a sixth primer, and another advantage of this method is that it is inexpensive. However, the main disadvantage is that it only allows short read lengths, which in turn results in Sequencing by Ligation becoming unsuitable for many applications [2]. 


Ion Semiconductor sequencing 


Ion Semiconductor sequencing is a method that uses the release of hydrogen ions to detect the sequence of the DNA cluster [4] (uses a ‘sequencing by synthesis’ approach [2]). Following ePCR [2], each cluster is located above a semiconductor transistor - which detects pH changes of the solution. When a nucleotide incorporates into the growing strand, a hydrogen ion is released into the solution as a byproduct [10]. This results in the solution becoming more acidic, which is detected by the semiconductor. Hence the degree of acidity is proportional to the number of nucleotides being added on [4]. If more than one of the same nucleotide is added, the change in pH or the signal intensity becomes larger (See Fig. 7). 



Fig. 7 Ion Semiconductor sequencing 

Source: https://www.atdbio.com/content/58/Next-generation-sequencing [2]


Ion Semiconductor sequencing is advantageous as it is much cheaper than pyrosequencing and is time-efficient as well [4]. However, it can be challenging to list the number of identical bases added consecutively; for instance, it is hard to differentiate the change in pH for a homorepeat of length 9 to one of length 10 (cannot differentiate repetitive sequences accurately) [2]. 


NGS in Clinical Practice 


Next-Generation Sequencing allows scientists and researchers worldwide to analyse the genome of living organisms with high accuracy and precision in a shorter period of time - it can sequence an entire human genome in just a few hours [2], and it is able to capture a broader spectrum of mutations than Sanger Sequencing can [11]. This reveals its huge potential in the field of medicine. Not only can these DNA sequencing tests be done on tumour cells, which subsequently may lead to more effective treatments (or even ones that are personalised) for patients under oncological care; but they can also be applied towards other advancements such as the diagnoses, as well as our understanding of complex genetic diseases - some of which we still may not have a confirmed diagnosis for. 


Sanger Sequencing is limited to the discovery of substitutions and small insertions or deletions in the DNA base sequence; hence there are other mutation-dedicated assays that are used such as fluorescence in city hybridisation (FISH) for conventional karyotyping and comparative genomic hybridisation (CGH) microarrays that allow the detection of submicroscopic chromosomal copy number changes e.g. microdeletions [11]. However, NGS platforms are able to derive these directly as well - so there is no need for mutation-dedicated assays anymore as NGS can obtain the entire spectrum of genomic variation in a quick experiment [11]. Moreover, Sanger Sequencing requires the pre knowledge of the gene or locus that is under investigation (if de novo, then it will take much longer to sequence); on the other hand, NGS can be used to sequence and analyse complete genomes and discover novel mutations and disease causing genes. Hence, it offers potential use in pediatrics as it could be used to find the mutations that cause many rare genetic diseases, as well as finding the genetic reasoning that would explain developmental delay in affected children - which is what a UK nationwide project, Deciphering Developmental Disorders, is doing [11]. Therefore, NGS can provide physicians with detailed clinical phenotypic information of the patient as well as making it easier to identify the novel mutated genes behind disorders. 


NGS can also be used to detect mosaic mutations in the body (if unsure about Mosaic mutations, please read my article on ‘Somatic Mosaicism and Rare Diseases’). Previously, Sanger Sequencing may miss these variants so they will go undetected (as signals are below the sensitivity or threshold of the technology); therefore, NGS provides a more detailed and sensitive analysis that can be used to identify variants in the genetic information of the cells. In recent years, NGS have been used for many sensitive investigations, such as the search of traces of foetal DNA in maternal blood or tracking the level of circulating tumour cells (CTCs) in the blood of the cancer patient [11] - and these liquid biopsies can be done in cases where tissue collection for molecular testing is unsafe e.g. the brain, lung, peritoneal lesions etc. [1] NGS also has a significant impact in the field of microbiology - an example of this was the incidence in the past where NGS was used to reveal a trace of an outbreak of methicillin-resistant Staphylococcus aureus (MRSA) in a neonatal intensive care unit in the UK, and the conventional routine microbiological surveillance was not able to show these cases of MRSA increases over many months [11]. 


NGS is also involved in Precision medicine, seeking to use genomic data to help tailor medical treatment directed to the patient’s unique genetic composition. Out of all the specialties in medicine, NGS will have a huge impact specifically on Oncology, as it is able to change cancer patient outcomes drastically [12]. In the Pre-NGS era, Sanger sequencing and PCR-based techniques provided limited information about the cancer mutational status - whereas now, NGS can sequence broad sets of genes and identity mutations in scarce biopsy tissue [1]; which in turn means that it can be used to sequence the patient’s tumours and then match therapies designed to target specific genetic alterations that are driving the tumour’s growth, which helps prevent the consumption of unnecessary medicine and reduce the treatment time, making it more effective and the patient may experience fewer or weaker side effects [12]. This is shown in Tsimberidou et al.’s study, which reveals that cancer patients given a treatment matched to their tumour mutations showed improved overall response rate (27% vs 5%), time to treatment failure (median 5.2 vs 2.2 months) and survival (median 13.4 vs 9.0 months) when compared to patients who did not receive the sequencing-matched therapy. Moreover, there is also evidence of this in Radovich et al.'s study, which reported that the progression free survival of patients (defined as ‘The length of time during and after the treatment of a disease, such as cancer, that a patient lives with the disease but it does not get worse’ [13]) with treatments matched to their DNA mutations, copy number variations or mRNA levels was higher than that of patients receiving non-matched therapy (86 vs 49 days)[12]. Furthermore, there are potential gene therapy treatments using NGS such as introducing an antisense RNA (that specifically prevents the synthesis of a targeted protein) to the oncogene, and another method that can kill cancer cells selectively [2]. However, the challenge to this is to ensure a very precise delivery system so that only cancer cells are being damaged and not the healthy cells [2]. 


There are also limitations and other disadvantages towards the use of NGS clinically; for instance, NGS may sequence some regions of the DNA sequence poorly due to the extreme guanine-cytosine (GC) content or any ‘repeat architecture’ could also be detected erroneously e.g. the repeat expansions of Fragile X Syndrome or Huntington’s disease [11]. Moreover, if NGS becomes more widespread, it can lead to issues such as the challenge of storing large quantities of sequencing data [2]. It can also pose a risk to safety and security; as well as other ethical issues - the ownership of one’s DNA, and there are also concerns that insurance groups, mortgage brokers and employers may use this data to modify insurance quotes or distinguish between candidates [2]. Also, NGS can reveal if an individual has an increased risk of a certain disease, but this raises the question of whether to inform the patient or not (there is a chance that the patient has had their DNA sequenced for purposes other than medical treatment) [2]. 


Bibliography 


[1] Morganti S. et al. (2020) Role of Next-Generation Sequencing Technologies in Personalized Medicine. In: Pravettoni G., Triberti S. (eds) P5 eHealth: An Agenda for the Health Technologies of the Future. Springer, Cham. https://doi.org/10.1007/978-3-030-27994-3_8


[2] “Next Generation Sequencing.” ATDBio, www.atdbio.com/content/58/Next-generation-sequencing. 


[3] https://www.goldbio.com/articles/article/Types-of-PCR-used-for-Genetic-Research


[4] Applied Biological Materials, director. 1) Next Generation Sequencing (NGS) - An Introduction. YouTube, 22 June 2015, www.youtube.com/watch?v=jFCD8Q6qSTM. 


[5] Shaffer, Dr. Catherine. “Challenges with Sanger Sequencing.” News, 26 Feb. 2019, www.news-medical.net/life-sciences/Challenges-with-Sanger-Sequencing.aspx. 


[6] Chen, Yi-Ping Phoebe, et al. “9.15 - Bioinformatics.” Comprehensive Natural Products II: Chemistry and Biology, edited by Hung-Wen (Ben) Liu and Lewis N. Mander, vol. 9, Elsevier, 2010, pp. 569–593. 


[7] Zhao, Fang, and Bharati Bapat. “The Role of Methylation-Specific PCR and Associated Techniques in Clinical Diagnostics.” Epigenetic Biomarkers and Diagnostics, 2016, pp. 155–173., doi:10.1016/b978-0-12-801899-6.00008-5. 


[8] Simner, Patricia J., et al. “Rapidly Growing Mycobacteria.” Molecular Medical Microbiology, 2015, pp. 1679–1690., doi:10.1016/b978-0-12-397169-2.00095-0.


[9] Voelkerding KV, Dames SA, Durtschi JD. Next-generation sequencing: from basic research to diagnostics. Clin Chem. 2009;55(4):641-658. doi:10.1373/clinchem.2008.112789  


[10]“Semiconductor Sequencing Technology: Thermo Fisher Scientific - US.” Semiconductor Sequencing Technology | Thermo Fisher Scientific - US, www.thermofisher.com/hk/en/home/life-science/sequencing/next-generation-sequencing/ion-torrent-next-generation-sequencing-technology.html. 


[11] Behjati, Sam, and Patrick S Tarpey. “What Is next Generation Sequencing?” Archives of Disease in Childhood - Education & Practice Edition, vol. 98, no. 6, 2013, pp. 236–238., doi:10.1136/archdischild-2013-304340. 


[12] Morash, Margaret, et al. “The Role of Next-Generation Sequencing in Precision Medicine: A Review of Outcomes in Oncology.” Journal of Personalized Medicine, vol. 8, no. 3, 2018, p. 30., doi:10.3390/jpm8030030. 


[13] “NCI Dictionary of Cancer Terms.” National Cancer Institute, www.cancer.gov/publications/dictionaries/cancer-terms/def/progression-free-survival.

Comments

Popular Posts