What is the Human Genome Project?
April 25, 1953, marked the publication of the double-helix model of DNA by James Watson and Francis Crick, based on the experimental data of Rosalind Franklin and others. It was fitting then, that fifty years later, in April of 2003, the complete sequence of the human genome was published, marking probably one of the greatest achievements in not only genetics but also all of science. In the years since then, thousands of scientists have been mining these data for information about the human body, how its genes shape development and behavior, and the role mutations play in diseases.
The Human Genome Project (HGP) began as a result of the catastrophic events of World War II: the dropping of atomic bombs on the Japanese cities of Nagasaki and Hiroshima. There were many survivors who had been exposed to high levels of radiation, known to cause mutations. Such survivors were stigmatized by society and were considered poor marriage prospects, because of potential genetic damage. The US Atomic Energy Commission of the US Department of Energy (DOE) established the Atomic Bomb Casualty Commission in 1947 to assess mutations in such survivors. However, there were no suitable methods to measure these mutations, and it would be many years before suitable techniques would be developed. Knowing the sequence of the human genome would be the greatest tool for identifying human mutations.
As in all areas of science, progress in molecular biology was limited by available technology. Many advances in molecular biology made feasible the undertaking of the HGP. Starting in the 1970s, techniques were developed to isolate and clone individual genes. By 1977, Walter Gilbert and Frederick Sanger had independently developed methods for sequencing DNA, and in 1977, Sanger’s group published the sequence of the first genome, the small bacterial virus Phi X174. In 1985, Kary B. Mullis and colleagues developed the method of polymerase chain reaction (PCR), in which extremely small amounts of DNA could be amplified billions of times, providing significant amounts of specific DNA for analysis. Finally, in 1986, Leroy Hood and Applied Biosystems developed an automated DNA sequencer that could sequence DNA hundreds of times faster than was previously possible. Additional advances in computer technology made it possible to sequence the human genome.
In 1985 a conference of leading scientists was held at the University of California, Santa Cruz, to discuss the feasibility of sequencing the entire human genome. Biologists were looking for the equivalent of a Manhattan Project for biology. The Manhattan Project was the concerted effort of physicists to develop atomic weapons during World War II and resulted in huge increases of government funding for physics research. Walter Gilbert called the HGP the Holy Grail of molecular biology. With impetus from the DOE and the National Research Council, the Human Genome Project was launched in 1990 with James Watson as head. The goal of this project was to completely sequence the human genome of three billion base pairs by 2005 at a cost of $1.00 per base pair. In 1992, Watson resigned over a controversy surrounding the patenting of human sequences. Francis Collins took over as head of the HGP at the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH). The sequencing of genetic model organisms, in addition to the human genome, was another of the goals of the NHGRI. This included genomes of the bacterium Escherichia coli, yeast, the fruit fly Drosophila melanogaster, the roundworm Caenorhabditis elegans, and other organisms. Moreover, 10 percent of the funding was to be directed toward studies of the social, ethical, and legal implications of learning the human genome.
J. Craig Venter, a former National Institutes of Health researcher, left the NIH and formed a private company, The Institute for Genomic Research (TIGR). TIGR, using a different approach (known as the shotgun method), was able to sequence the 1.8 million-base-pair genome of the first free-living organism, the bacteriumHaemophilus influenzae, in less than a year. In 1998, Venter along with Perkin-Elmer Corporation formed the biotech company Celera Genomics to sequence the human genome privately. Celera had more than three hundred of the world’s fastest automated sequencers and a supercomputer to analyze data. Meanwhile, public funds supported scientists in the United States, the United Kingdom, Japan, Canada, Sweden, and fourteen other countries working on HGP sequencing. The public sector was now in competition with Celera. To assure free access, each day new sequence data from the public projects were made available on the Internet.
In 2001, the first draft of the human genome sequence was published in the February 15 issue of Nature and the February 16 issue of Science. There are many short, repeated sequences of DNA in the genome, and certain regions that were difficult to sequence that needed to be sequenced again for accuracy, plus proofreading the sequence for errors in the process. Thus in April 2003, the final sequence of the human genome was achieved. It is remarkable that a government-funded project was completed two and a half years ahead of schedule and under budget, due to the ever increasing improvement of DNA technology and accuracy. April 25, 2003, was designated National DNA Day and has remained an annual day to educate the public, especially school-age children, about DNA and genetics in general.
Perhaps the most surprising finding from the HGP is the relatively small number of human genes in the genome. Scientists had predicted the human genome would contain about one hundred thousand functional genes, yet the actual number of protein-coding sequences is approximately twenty-five thousand, representing only about 1 percent of the entire genome. In comparison, yeast has about six thousand genes, the fruit fly about thirteen thousand, and the Caenorhabditis about eighteen thousand. It was surprising that a complex human had less than twice the number of genes as the roundworm. The human genome also contains 740 genes that encode stable RNAs. The genome of the mouse, another model genetic organism, has providing interesting comparisons to the human genome.
Although more than 99.99 percent of the DNA sequences of all humans are identical, 0.01 percent difference equals approximately 30 million base pair changes among individuals. One important question is, then, whose genome was sequenced? Venter has acknowledged that Celera has been sequencing mostly his DNA. However, the final sequence database is an “average” or “consensus” genome that is a conglomerate of many individuals contributing to the total sequence. Every human carries many and perhaps even hundreds of varying DNA changes. Even before the HGP was completed, databases listing single nucleotide polymorphisms were being established. These databases list the types of genetic variations that occur at individual nucleotides in the genome. For example, a cancer gene database lists the types of mutations that have been identified in specific cancer-causing genes and the frequency of such mutations. Mutations in genes such as BRCA1 and BRCA2 are responsible for breast and ovarian cancers, while mutations in the tumor-suppressor gene p53 have been found in the majority of human tumors.
The Human Genome Project has given rise to two new fields of study. Genomics is the study of genomes. To do so requires databases and search engines to seek out information from these sequences. There are hundreds of such databases already established. Scientists can search for complete gene sequences if they know only a short segment of a gene. They can look for related sequences within the same genome or among different species. From such information one can study the evolution of particular genes.
The next step is to define the human proteome, giving rise to the field of proteomics. Proteomics seeks to determine the expression patterns of genes, the functions of the proteins produced, and the structure of specific proteins derived from their DNA sequence. If a particular protein is involved in a disease process, specific drugs to interfere with it may be designed.
Since 2003, many projects have developed to enhance our knowledge of the human genome. Two notable projects are the Cancer Genome Atlas and the Cancer Genome Anatomy Project . The goals of both projects are to determine the genes that underlie the cause of cancers, to find targeted gene therapy treatments, and to prevent those diseases. To date, several outcomes have become important to further progress in understanding the human genome, including the identification of 350 cancer-related genes and the establishment of publicly accessed databases of expressed sequence tags found throughout the genome.
With the success of the sequencing of the human genome has come the sequence completion of many more genomes of organisms, including the sequencing of the cow and dog genomes in 2004, five different domesticated pig breeds in 2005, and the domesticated cat in 2007, and the gorilla in 2012. The genomes of other strategically selected organisms have also been sequenced. The view of the National Human Genome Research Institute (NHGRI) is that to study essential functional and structural components of the human genome most effectively is to compare it with other organisms. Many of the selected organisms are in the mammalian order—for example, the giant panda, rabbit, and elephant. Also chosen, however, are nonmammalian organisms representing positions on the evolutionary time line that have been marked by important changes in anatomy, physiology development, or behavior. These organisms include slime mold, a ciliate, a choanoflagellate, a placozoan, a cnidarian (hydra), snails, roundworms, and lamprey eels.
Another great achievement of the HGP has been the acceleration of innovative technologies to use sequenced data. For example, copy number variants and single nucleotide polymorphisms (SNPs) are now being analyzed and used for the development of genetic tests that were unavailable before. Another technology, microarray analysis, utilizes the human genome to look at large numbers of small segments of DNA that, if mutated, may cause disease. Direct results of the Human Genome Project also include the International HapMap Project, the 1000 Genomes Project, and commercial "personal genotype sequencing." The study of the human genome has allowed scientists to make breakthroughs not only in the basic understanding of DNA and the genome but also in how the human genome changes with time and in individuals to cause disease and evolution. Humanity is just beginning to reap the benefits from the Human Genome Project.
Choudhuri, Supratim. Bioinformatics for Beginners: Genes, Genomes, Molecular Evolution, Databases, and Analytical Tools. Burlington: Elsevier Science, 2014. Digital file.
Collins, Francis, and Karin G. Jegalian. “Deciphering the Code of Life.” Scientific American 281.6 (1999): 86–91. Print.
Dennis, Carina, and Richard Gallagher. The Human Genome. London: Palgrave, 2002. Print.
Hampton, Tracy. "Human Genome Initiatives Make Strides to Better Understand Health and Disease." JAMA: The Journal of the American Medical Association 309.14 (2013): 1449–51. MEDLINE with Full Text. Web. 25 July 2014.
Hyde, Michael J., and James A. Herrick. After the Genome: A Language for Our Biotechnological Future. Waco: Baylor UP, 2013. Print.
International Human Genome Sequencing Consortium. “Finishing the Eukaryotic Sequence of the Human Genome.” Nature 431 (2004): 931–45. Print.
International Human Genome Sequencing Consortium. “Initial Sequencing and Analysis of the Human Genome.” Nature 409 (2001): 860–921. Print.
Morris, Peter J. "From Mendel to the Human Genome Project." North Carolina Medical Journal 74.6 (2013): 477. MEDLINE Complete. Web. 25 July 2014.
Naidoo, Nasheen, et al. "Human Genetics and Genomics a Decade after the Release of the Draft Sequence of the Human Genome." Human Genomics 5.6 (2011): 577–622. MEDLINE Complete. Web. 25 July 2014.
Sulston, John, and Georgina Ferry. The Common Thread: A Story of Science, Politics, Ethics, and the Human Genome. Washington, DC: Joseph Henry, 2002. Print.
Wolfsberg, Tyra G., et al. “A User’s Guide to the Human Genome.” Nature Genetics Supplement 32 (2002): 1–79. Print.