A deep dive into Sanger sequencing and Next-Generation Sequencing (NGS) technologies and their impact on modern biology.
If you were to type out your entire genetic code at a rate of 60 words per minute, it would take you 95 years of non-stop typing. How did scientists manage to read all 3 billion letters in just over a decade, and why can we now do it in less than a day?
Developed in 1977, Sanger sequencing (or dideoxy chain termination) relies on a clever chemical 'stop sign.' The process uses standard deoxynucleotides (dNTPs) and modified dideoxynucleotides (ddNTPs). Unlike dNTPs, a ddNTP lacks the group necessary for forming a phosphodiester bond with the next nucleotide. When DNA polymerase randomly incorporates a fluorescently labeled ddNTP, elongation halts. This creates a collection of DNA fragments of every possible length. By separating these fragments via capillary electrophoresis, a laser detects the terminal fluorescent tag, allowing us to 'read' the sequence one base at a time.
Imagine a sequencing reaction where you see four peaks of light passing a detector in this order: Blue (C), Red (T), Green (A), Yellow (G). 1. The shortest fragment ended with a Blue ddNTP, so the first base is . 2. The next fragment, one base longer, ended with Red, so the sequence is . 3. Following the sequence of peaks, the final read is .
Quick Check
What specific chemical group is missing from a ddNTP that prevents further DNA chain elongation?
Answer
The (hydroxyl) group.
While Sanger sequencing reads one DNA fragment at a time, Next-Generation Sequencing (NGS) reads millions simultaneously. This is known as high-throughput sequencing. In a common method called sequencing by synthesis, genomic DNA is fragmented and attached to a flow cell. Each fragment is amplified into a 'cluster.' As fluorescently labeled nucleotides are added, a high-resolution camera records the light emitted by millions of clusters at once. This shift from 'serial' to 'parallel' processing reduced the cost of sequencing a human genome from $\$100\ today.
Compare the efficiency of Sanger vs. NGS: 1. A Sanger machine reads bases per reaction. 2. An NGS platform can generate bases (3 Terabases) in a single run. 3. Calculation: . One NGS run can produce the equivalent of 3 billion Sanger reactions.
Quick Check
Why is NGS often referred to as 'massively parallel' sequencing?
Answer
Because it sequences millions of different DNA fragments simultaneously on a single chip or flow cell.
Raw sequence data is useless without Bioinformatics—the marriage of biology and computer science. Tools like BLAST (Basic Local Alignment Search Tool) allow scientists to compare a query sequence against a massive database to find similarities. This process, called annotation, identifies genes, regulatory elements, and SNPs (Single Nucleotide Polymorphisms). These SNPs are the foundation of personalized medicine, where a patient's specific genetic profile determines which medications will be most effective and least toxic, a field known as pharmacogenomics.
A patient requires the drug Warfarin (a blood thinner). 1. The doctor sequences the patient's CYP2C9 gene. 2. Bioinformatics tools compare the sequence to a reference genome and find a SNP that results in slow metabolism of the drug. 3. Instead of the standard dose (), the doctor applies a reduction formula, perhaps , to prevent toxic buildup. This is personalized medicine.
In Sanger sequencing, what is the result of adding a fluorescently labeled ddNTP to the reaction mix?
Which technology was primarily responsible for the rapid decrease in the cost of genome sequencing after the year 2005?
Pharmacogenomics is the study of how an individual's entire genome affects their response to drugs.
Review Tomorrow
In 24 hours, try to explain the difference between a dNTP and a ddNTP to a friend, and why that 'one missing oxygen' changed biology.
Practice Activity
Visit the NCBI website and use the 'BLAST' tool with a sample DNA sequence to see how bioinformatics identifies unknown organisms.