Concepts familiar from grade-school algebra have broad ramifications in computer science.
Sixteen genome centers around the world, including the Whitehead Institute/MIT Center for Genome Research, have launched the final phase of the Human Genome Project. The milestone marks the transition from the initial phase of generating a "working draft" of the human DNA to the final phase of producing the complete "finished" sequence.
Phase One was launched in March 1999 and has produced coverage of the vast majority of the human chromosomes in 14 months, at a total cost of about $300 million. The last remaining DNA from this first phase is already in the centers' sequencing pipelines and is flowing into public databases at a rate of 10,000 DNA letters per minute; they will all be deposited by mid-June.
Phase Two will involve producing a "finished" sequence of the human genome, by filling the gaps in the sequence and by increasing the overall sequence accuracy to 99.99 percent.
"The progress in human DNA sequencing has been stunning," said Professor Eric S. Lander, director of the Whitehead Institute Center for Genome Research, who attributed the acceleration to advances in automation, informatics and organization at the various centers. "The early projections have been left in the dust. The result has been an information explosion that is fueling a revolution in biomedical research."
The goal of the first phase was to create a "working draft" covering 90 percent of the euchromatic portion of the human DNA, by sequencing large "clones" representing segments from the genome. Draft sequence allows scientists to directly identify the vast majority of the human genes, although the sequence itself still contains gaps and uncertainties.
The "working draft" is assembled in a two-step fashion. Each clone is first "assembled" from its sequence information. The various clones can then be "assembled" together into a "layout" on the human genome, based on their chromosomal location.
The first comprehensive "layout" of the human genome was constructed in mid-April by scientists in the Human Genome Project international consortium. The layout shows the chromosomal positions and the detailed relationships among the more than 20,000 large clones used to sequence the genome; it also spotlights the remaining segments to be covered. The clones in the layout also have immense value beyond their immediate role as an aid in sequencing: they provide a permanent resource for human genetics because they can be used for direct biological studies of gene function.
The sequence information from the "working draft" has been immediately and freely released to the world, with no restrictions on its use or redistribution. The information is scanned daily by scientists in academia and industry, as well as by commercial database companies providing information services to biotechnologists.
The "working draft" has allowed human geneticists to find genes responsible for dozens of inherited diseases -- including breast cancer, hereditary deafness, stroke, epilepsy, diabetes and various skeletal disorders. The draft sequence is also being used as a resource by the SNP Consortium, an industry-academia collaboration, to identify sites of DNA sequence variation in the human population. Finally, it has propelled many basic biological studies. For example, researchers have recently used it to discover the molecular basis of the sense of taste.
The process involves two activities: (1) performing additional sequencing from the clones used in Phase One and (2) selecting and sequencing some additional clones from chromosomal segments not covered in Phase One.
Although the "working draft" sequence allows for the recognition of genes themselves, the higher accuracy and completeness of the "finished sequence" makes it a gold-standard reference that can be readily compared to individual patients' DNA to identify specific single-letter mutations causing hereditary diseases.
A version of this article appeared in MIT Tech Talk on May 31, 2000.