25 terms

Conversational Sequencing


Terms in this set (...)

Next Generation Sequencing
A high-throughput sequencing method which parallelizes the sequencing process, producing thousands or millions of sequences at once.
Deep Sequencing
Techniques of nucleotide sequence analysis that increase the range, complexity, sensitivity, and accuracy of results by greatly increasing the scale of operations and thus the number of nucleotides, and the number of copies of each nucleotide sequenced.
Paired-End Sequencing
Sequence both ends of the same fragment and keep track of the paired data.
Short oligonucleotides which are attached to the DNA to be sequenced. An adapter can provide a priming site for both amplification and sequencing of the adjoining, unknown nucleic acid. It can also attach the oligo to a surface and it is used for quantitative purposes as it can carry a fluorophore.
A collection of DNA fragments with adapters ligated to each end.
Bridge Amplification
Generation of in situ copies of a specific DNA molecule on an oligo-decorated solid support.
Emulsion PCR
A method for bead-based amplification of a library. A single adapter-bound fragment is attached to the surface of a bead, and an oil emulsion micelle containing necessary amplification reagents is formed around the bead/fragment component. Parallel amplification of millions of beads with millions of single strand fragments produces a sequencer-ready library.
Mapping of sequence reads to a known reference sequence
Reference sequence/genome
A fully assembled version of a genome that can be used for mapping short DNA sequence reads for comparisons of genomes from various individuals
Coverage Depth
The number of nucleotides from reads that are mapped to a given position on a reference genome.
The percentage of sequences that map to the intended targets out of total bases per run.
The variability in sequence coverage across target regions.
Uninterrupted stretch of a single nucleotide type (e.g., TTT or GGGGGG)
InDel stands for Insertion or deletion. A form of structural variation in which a DNA segment is either deleted or inserted.
SNP stands for Single Nucleotide Polymorphism. A single base difference found when comparing the same DNA sequence from two different individuals.
Shotgun sequencing
sequencing method designed for analysis of DNA sequences longer than 1000 base pairs, up to and including entire chromosomes. This method requires the target DNA to be broken into random fragments. After sequencing individual fragments, the sequences can be reassembled on the basis of their overlapping regions
Chromosome walking:
A technique for cloning everything in the genome around a known piece of DNA (the starting probe). You screen a genomic library for all clones hybridizing with the probe, and then figure out which one extends furthest into the surrounding DNA. The most distal piece of this most distal clone is then used as a probe, so that ever more distal regions can be cloned. This has been used to move as much as 200 kb away from a given starting point (an immense undertaking). Typically used to "walk" from a starting point towards some nearby gene in order to clone that gene. Also used to obtain the remainder of a gene when you have isolated a part of it.
noun : The term comes from a shortening of the word 'contiguous'.

More often, the term 'contig' is used to refer to the final product of a shotgun sequencing project. When individual lanes of sequence information are merged to infer the sequence of the larger DNA piece, the product consensus sequence is called a 'contig'.
Read Length
Some sequencing instruments give you flexibility in choosing the number of base pairs (cycles) you can read at one time. The number of cycles corresponds to the output read length. While longer read lengths give you more accurate information on the relative positions of your bases in a genome, they are more expensive than shorter ones. 50 cycles are typically sufficient for simple mapping of reads to a reference genome, and RNA-Seq profiling or counting experiments. Read lengths greater than or equal to 100 are typically chosen for genome or transcriptome studies that require high amounts of output.
Number of Reads
During a DNA sequencing reaction, sequenced base pairs or "reads" are generated. Each sequencing platform and instrument yield different numbers of reads.
Depth of Coverage (DNA)
A sequencing run generates reads that sample a genome randomly and independently [1]. These reads are not distributed equally across an entire genome; some bases are covered by fewer reads, some by more reads than the average coverage. Coverage refers to the average number of times a single base is read during a sequencing run. If the coverage is 100 X, this means that on average each base was sequenced 100 times. The more frequently a base is sequenced, the more reliable a base is called, resulting in better quality of your data. The Lander / Waterman equation is one method for determining coverage. C=LN/G, where C is coverage, L is read length, N is the number of reads and G is the haploid genome length.
Replication, Randomization and Multiplexing
Replicates are essential in any biological experiment, the same goes for high throughput sequencing. Samples are subject to variation thus making biological replicates important for statistical significance and identifying sources of variation. Despite the desire to cut back on replicates to reduce cost, it's important to remember that there are many factors which may cause a sequencing run or sample to fail. If you don't have sufficient replicates, you may have to repeat your sequencing run. In general we recommend at least 4 biological replicates for every experiment.
Replication, Randomization and Multiplexing
Randomization is a process of assigning biological samples at random to groups or to different groups within an experiment. This reduces bias by equalizing independent variables that have not been accounted for in the experimental design. Randomization reduces instrument effect, systemic bias and the potential for the occurrence and effect of confounding factors (operational, procedural and person confound). The two main sources of variation that contribute to confounding factors are 1) library effects that occur due to reverse transcription and amplification and 2) unit effects (sequencing lanes [Illumina and SOLiD], chips [Ion], plates [Roche 454]) such as poor base calling, bad sequencing cycles. We recommend randomizing your samples by making sure each sequencing unit contains samples from both control and experimental groups. This can be done by barcoding or indexing your samples to allow for multiplexing.
Replication, Randomization and Multiplexing
DNA (or cDNA fragments made from RNA) can be labelled with sample specific sequences or barcodes that allow multiple samples to be included in the same sequencing reaction. Multiplexing allows for proper sample identification after the sequencing run is complete. Multiplexing can be used to create balanced, pooled experimental designs. If you have 8 samples that require the sequencing output obtained from 3 Illumina lanes, unit effects can be eliminated by multiplexing all 8 samples and loading each 8 sample multiplexed pool into all 8 lanes. All unit (lane effects) will be the same for each sample. Multiplexing also has the advantage of eliminating phasing issues related to low multiplex pools. Low multiplexed pools can result in no signal in one of the color channels of an index read. The image registration might fail and no base will be called from that cycle. If a base isn't called then samples will not be able to be demultiplexed.
A sequencing approach that uses several pooled samples simultaneously, greatly increasing sequencing speed.