From Organism to DNA Sequence Data
An Overview
The process of obtaining molecular data from individual organisms is one that can be divided into five major processes; these are: DNA extraction, amplification, purification, cycle sequencing, and direct sequencing. Additionally, it is customary to use gel electrophoresis to monitor the results at each stage. Although these are common practices, there are many variations in the protocols and products that are employed in each step. The discussion below provides a general overview of these steps. For more details on the sequencing protocols, click here.
DNA Extraction
The initial step is the extraction, or isolation, of cellular DNA from other components of the organism. A variety of techniques exists that enable one to quickly isolate DNA from other organic molecules (i.e., lipids, proteins, saccharides). The choice of the most suitable technique often depends on the type of tissue sample or specimen under study.
Currently, we use DNeasy Tissue Kit, produced by Qiagen. The procedure is straightforward. Most of this time is dedicated to the digestion, or lysis, of the tissue sample, and the time required for this step varies greatly with the type of sample. For instance, the "digestion" of tissue from a rodent's tail will require considerably more time than the lysis of blood cells. In our laboratory, we most often isolate DNA from pieces of muscle tissue (e.g., from crabs, fishes, molluscs) or small whole organisms (e.g., phyllocarid or peracarid crustaceans). Generally, the lysis of such tissues with proteinase K requires between two and five hours. Once tissue digestion is complete, the lysate is added to a plastic tube containing a silica-gel membrane. The silica-gel membrane selectively binds DNA, permitting other components of the lysate (e.g., proteins, lipids, divalent cations) to pass through it. The lysate is forced through the membrane by centrifugation. Any remaining contaminants are removed by briefly washing the silica-gel membrane with buffers provided in the kit. The contaminant-free nucleic acids are then washed, or eluted, from the silica-gel membrane using water or an "elution buffer." The product is a solution that contains only the water (or elution buffer) and cellular DNA from the original sample. Naturally, the volume of product will depend on the volume used for the elution. We usually perform two 100-µL elutions per sample and store each elution separately. The samples are then stored in 1.5-mL tubes in a -20°C or -80°C freezer until needed.
Click here for a pdf of an alternative, Chelex protocol (size, 40kb).
Amplification (PCR)
Once the total cellular DNA has been isolated, the gene (or gene fragment) of interest must be amplified. The procedure used to amplify the DNA fragment of interest is referred to as the Polymerase Chain Reaction (PCR). Typically, the reaction mixture includes the following ingredients: water, buffer, magnesium chloride (MgCl2), forward and reverse primers, deoxynucleotides (dNTPs), polymerase, and the sample DNA.

Results of DNA amplification reactions using primers designed to amplify a fragement of the large subunti ribosomal DNA (28S) for various leptostracan specimens (April, 2004).
Purification
The product of the amplification reaction will include more than the desired gene fragment; primers, unincorporated nucleotides, salts, enzymes, and other molecules will also be present in the solution. We use the Rapid PCR Purification kit produced by Marligen. The technique employed for purification is quite similar to that described for DNA extractions. The PCR product is placed into plastic tubes, or "spin cartridges," containing silica membranes. The solution is forced through the membrane by centrifugation. The membrane selectively binds the DNA fragment and permits the unwanted components of the solution to pass through it. The membrane is washed with buffers, and the waste is discarded. In the final step, the DNA is eluted from the cartridge with water or TBE buffer. At this point, the desired gene fragment has been successfully extracted, amplified and purified.
Cycle Sequencing
The nucleotide sequence of the purified DNA template must ultimately be determined, a process referred to as sequencing. First, there must be some mechanism by which the automated sequencer is able to "read" the sequence. A process called "cycle sequencing" is used to label the ends of nucleotide fragments with a fluorescent dye that can be read by an automated sequencer. There are four different fluorescent dyes, one for each of the four bases (A, C, T, G).
The cycle sequencing reaction itself involves the replication of DNA by primer extension during thermal cycling reactions. For this reason, the recipe for cycle sequencing reaction is very similar to that used for the earlier amplification (PCR) step. The ingredients generally include buffer and/or water, a primer, polymerase, dNTPs, and the DNA template.
A typical thermal cycler protocol might include the following steps:
Step 1: 96° C for 3 minutes (hot start)
Step 2: 96° C for 10 seconds (denaturation)
Step 3: 50° C for 15 seconds (annealing)
Step 4: 60° C for 4 minutes (primer extension)
Step 5: Go to step 2 24 times (cycling)
Step 6: End and hold at 4°C
In amplification reactions mentioned earlier, the polymerase serves to extend the new sequence by adding those nucleotides (dNTPs) that are complementary to the original template. Recall that there are four kinds, bearing the bases adenine (A), cytosine (C), guanine (G), and thymine (T). These are the common building blocks used by the polymerase in the synthesis of the new DNA strand; the cycle sequencing reaction mixture includes these dNTPs for the same reason. However, the cycle sequencing mixture also includes dideoxynucleotides, or ddNTPs. Dideoxynucletides are referred to as such because at the 3' position of their pentose sugar they have a hydrogen atom instead of the hydroxyl group present in normal nucleotides. Normal DNA strands are essentially chains of deoxynucleotides that are linked together by phosphodiester bonds. These bonds form between the phosphate group at the 5' position of one nucleotide and the hydroxyl group at the 3' position of the next nucleotide (see a biology text for review). Because dideoxynucleotides lack the hydroxyl group at the 3' position and therefore prevent the formation of a phosphodiester bond, the addition of a dideoxynucleotide to a DNA strand terminates extension of the strand. In cycle sequencing reactions, the DNA polymerase randomly adds dNTPs and ddNTPs, both of which are available in the reaction mixture. Whenever a dNTP is added, extension of the growing strand may continue; when a ddNTP is added, chain termination occurs. Because the incorporation of the ddNTPs (and, hence, chain termination) is random, the reactions produce a variety of DNA fragment sizes that range from only two nucleotides to strands that are the full length of the target fragment (e.g., 500-800 base pairs). Each strand present in the final solution, whether just a few nucleotides or 500 nucleotides long, with end with a dideoxynucleotide. Because each dideoxynucleotide is fluorescently labeled, each fragment is terminated with a dye that can be read by the automated sequencer. An on-line tutorial for this process can be viewed at http://www.dnalc.org.
Direct sequencing
The product of the cycle sequencing reaction is a solution that contains billions of ddNTP-labeled DNA fragments, with all possible fragment lengths represented, as well as other components of the reaction mixture. The solution is purified (e.g., via precipitation), and then two things are added to the purified DNA sample. These are formamide and a loading dye. The latter is simply for visualization of the sample during the loading step.
Each DNA sample is then loaded into the well of the polyacrylamide sequencing gel, so that the fragments can be separated by size with electrophoresis. The fragments are drawn downward through the gel. An ultraviolet laser scans the bottom of the gel (side-to-side) and interprets the fluorescent signal of the ddNTP-labeled fragments as they pass. Because the smallest fragments migrate through the gel matrix most quickly, the first fragment to be read is the length of the primer plus one nucleotide; the next fragment to be scanned is the length of the primer plus two nucleotides, and so on. The sequencer will interpret the identity of the nucleotide at the end of each passing fragment according to the terminal fluorescent dideoxynucleotide (ddNTP) with which the fragment is labeled. Peaks of fluorescence are correlated with the presence of a particular base (A, T, C, or G). An electropherogram is generated, one nucleotide at a time, as the presence of the different bases is recorded. The sequencing process may run as long as 8 to 12 hours and produce sequence reads of up to 700 or 800 nucleotides (base pairs).
Analysis of DNA Sequence Data
After a run with the ABI Sequencer, data are extracted from the gel image. Electropherograms are checked using ABI's Sequencing Analysis program.

In what can be considered a critical step of the process, the sequences are then imported to a program such as CLUSTAL X or MacClade4.0 to be aligned (see below).

The MacClade file is then imported directly into the phylogenetic inference program PAUP for analysis of the dataset. For more information about various options and assumptions that can be employed with this software, see the PAUP website.
References of note
Goloboff PA. 1993. NONA, Version 1.8. Program and Documentation. Available from J. M. Carpenter, American Museum of Natural History, New York.
Goloboff PA. 1997. Pee-Wee, Version 2.8. Program and Documentation. Available from J. M. Carpenter, American Museum of Natural History, New York.
Maddison, D. R. and W. P. Maddison. MacClade 4.0. Sinauer Associates, Sunderland, MA. (see http://macclade.org/index.html)
Swofford, D. L. 1998. PAUP*, 4.0. Sinauer Associates, Sunderland, MA.
Prepared by Todd Haney Prepared by Todd Haney Prepared by Todd Haney Prepared by Todd Haney Prepared by Todd Haney