Biopolymers & Proteomics Laboratory

The David H.Koch Institute for Integrative Cancer Research at MIT

Bldg 76 Room 181

Telephone: 617-253-7038

Massively Parallel Genome Sequencing Analysis

aka Next Generation DNA Sequencing or Solexa DNA Sequencing



In October 2007, the Biopolymers Facility obtained a “next generation” DNA sequencer. The Illumina Genome Analyzer (aka Solexa) was purchased through the combined efforts of six MIT HHMI faculty: Robert Horvitz, Tyler Jacks, Angelika Amon, Stephen Bell, Susumu Tonegawa and Richard Hynes. This instrument currently generates 2 gigabases of DNA sequence per week and according to the manufacturer will generate 95 gigabases (the equivelent of 30 human genomes) per 10 day run by the end of 2009. The cost per base makes sequencing entire genomes practical for any research laboratory. Multiple bacterial genomes can be barcoded and sequenced simultaneously with 20 fold coverage in a single sample lane. Targeted DNA methods are being developed by companies like Nimblegen and Agilent that allow enrichment of specific parts of a genome e.g. specific chromosomes or exons. This strategy will allow multi-thousand fold sequence coverage of specific genome regions providing definitive answers to biological questions.

Upgraded to the GA2 model in 2008, this high-throughput sequencing system enables a wide range of genetic analysis applications including small genome de novo sequencing, SNP analysis, ChIP Seq, small RNA discovery and analysis, digital RNA expression analysis, paired end aka mate pair analysis, copy number variation analysis, multiplexing of samples via barcoding, structural rearrangement analysis, deletion and insertion polynmorphism (DIP-Seq), cross linking immunoprecipitation sequencing (CLIP-Seq) and Epienomics e.g. methylation analysis (BS-Seq),

The instrument system consists of a cluster station, paired end module, IPAR image analysis computer and genome analyzer module allowing the analysis of eight samples (lanes) simultaneously. The paired end module is employed when researchers wish to multiplex samples via barcoding or when sequence is desired from both ends of the library template. Using an Illumina sample preparation kit or one's own favorite protocol, DNA or cDNA is fragmented into 120 bp strands, and adapters are ligated to both ends of each fragment. The adapter/linkers allow the fragment library to anneal to a glass flowcell and also provide a priming site immediately before the insert to be sequenced. Biopolymers hybridizes the library, using the Cluster Station, to a flow cell at a concentration of 4 pM which allows the library to randomly anneal to the flow cell such that the individual templates are approximately 1 um apart. A sequencing primer is added in the cluster station. The flow cell is then inserted in the genome analyzer where pyrosequencing or “sequencing by synthesis” using polymerase and dye labeled nucleotides occurs. After each sequencing cycle, template clusters are illuminated by laser and approximately 160,000 .tif images (one terabyte of data) from the flowcell are acquired. These images are transferred to the IPAR computer and decoded into base calls. A typical single read sample lane currently generates 5 million individual 40 base reads at Q20 Phred quality.

Samples are analyzed and processed in the Biopolymers Lab, however, consultation regarding biological interpretation is accomplished at the KI Bioinformatics Facility. The Biopolymers Facility is responsible for generating the flow-cell .tif images and base calling. Alignment of the sequences to a reference genome and subsequent biological interpretation requires massive computing hardware and special expertise and is accomplished by Charles Whittaker in KI Bioinformatics.

For BI support or pre-run consultation, please contact:

Dr. Charles Whittaker Ph.D.
(617) 324-0337

Note: In depth BI consultation is available only to KI members. Non-KI labs receive primary data which includes a FASTA file containing e.g. 5 million 40 base reads, alignment to the reference genome information, plus Illumina's Solexa Pipeline standard quality control biometrics files.

Sample Submission

Please submit sample libraries in eppendorf tubes (frozen) along with a Solexa Request Form.

The first base called will be the first base after the adapter sequence i.e. the first base of your template, from a 5’ to 3’ direction.

Check the web site for up-to-date chargeback fees (

Please submit the following documentation along with your samples to assist us in calculating molarity for sample loading:

  1. OD readings from a NanoDrop, etc. spectrometer: 260, 260/280 or 260/230 
    (double stranded library, use 657 Da. / ds base pair and 50 ug/OD260)
  2. Indication of MW e.g. Agilent Model 2100 Bioanalyzer trace or analytical gel image of the purified library.
  3. species for sequence alignment
  4. Sample prep kit used
  5. Brief description of project

NanoDrop and Bioanalyzer services are available.

Sample Preparation

Important: If you plan to use any sample prep kit other than the Genomic DNA kit (see below) please contact Richard Cook or Alla Leshinsky (617 253-7038) as soon as possible because we may have to order a special cluster generation kit to accommodate those samples.

We always run a PhiX control on each flow cell so 7 is the maximum number of samples that can be run at once. PhiX data are needed to generate matrix and phasing files needed to analyze the rest of the flow cell.

Please contact Richard Cook or Alla Leshinsky for our copies of the Illumina .pdf sample preparation files or the Sample Prep Kit Price List. Each kit comes with the latest sample prep instructions so use those instructions to create the library. These pdf files can also be obtained from the Illumina Website.

  • 1003806 Genomic DNA Sample Prep
  • Paired End DNA Sample Prep Kit
  • 1003801 SmallRNA Sample Prep
  • 1003802 GeneExpression NlaIII Sample Prep
  • 1003803 GeneExpression DpnII Sample Prep
  • 11257047 ChIP Sample Prep

New sample prep kits were added March 2009. Please visit the Illumina Website for latest kit list.