What is genomics?
Genomics grew out of the field of genetics, the study of heredity. Until the late twentieth century, it had not been possible to study the complete set of hereditary information in a living organism. Thus, while the field of genetics traces its roots to the 1860s, when the Austrian monk Gregor Mendel performed experiments on the mechanism of heredity in pea plants, the field of genomics is much younger, dating from the 1980s. It was in this decade that American geneticist Thomas Roderick used the term genomics to name a new scientific journal that dealt with the analysis of genomic information. In Mendel’s time, while organisms were seen to exhibit certain traits, it was not known how these traits were determined. By the early twentieth century, it was recognized that traits are inherited in units of information called genes, although the chemical nature of the gene was still unknown. It took until the middle of that century to recognize that genes were made up of deoxyribonucleic acid (DNA), the structure of which was first identified by American biologist James Watson and British biophysicist Francis Crick in 1953.
DNA is made up of four different deoxyribonucleotides, commonly referred to as bases: adenine (A), cytosine (C), guanine (G), and thymine (T). Together, they spell out a chemical code that is used by the cell to make proteins. Since it is the set of proteins contained within a cell that gives that cell its unique properties, determining the order of DNA bases in a given genetic unit will reveal what types of proteins are encoded by this information, a procedure known as DNA sequencing. While a gene has been defined as the amount of DNA needed to encode one protein, a genome is the entire set of genes found in an organism, including any noncoding DNA found between genes. The number of genes that have been found to be present in an organism varies from fewer than two hundred in some obligately parasitic bacteria to about twenty-three thousand (in the simple flowering plant Arabidopsis); humans were found to have slightly fewer than this number.
During the 1980s, a public consortium, the International Human Genome Sequencing Consortium, was formed with the goal of sequencing the human genome by 2005. The Human Genome Project, as this effort was called, also had the goal of sequencing the genomes of a number of model organisms that have been used by scientists to help understand biological complexity. These model organisms, which included the Escherichia coli bacterium, yeast, Caenorhabditis elegans (a roundworm), Drosophila (the fruit fly), and the mouse, also served as steps by which the efficiency of DNA sequencing could be improved over time. E. coli, like most bacteria, has a genome that numbers in the millions of bases—usually abbreviated bp (for base pair), since each base in DNA is paired with its complementary base, A to T and G to C—while yeast has a genome of approximately ten million bp and the next three organisms have genomes that number in the hundreds of millions of bp. Mice, like humans, have genomes that are three billion bp in size. Around the turn of the twentieth century, steady progress was made on the genome efforts described above as the sequence of each respective genome was determined and made publicly available via computer databases. The completion of the human genome was announced in April 2003, the month of the fiftieth anniversary of Watson and Crick’s first description of the structure of DNA.
Does the field of genomics represent a new scientific discipline in and of itself, or is it just another extension of genetics? Other than the sheer size of hereditary information being analyzed in genomics, another major difference would support the former possibility. Ever since Mendel’s time, genetics has taken a reductionist approach. Since Mendel could not have dreamed of understanding the pea plant as a whole, he limited his investigation to a number of easily characterized traits, such as plant height and seed color. Since that time, many scientists have sought to understand complex biological processes by breaking them into manageable pieces. Genomics attempts a different, expansionist approach. In the postgenomics era (as the twenty-first century has been called by some scientists), the questions being posed are holistic in nature: they attempt for the first time to understand an organism as a whole using its complete set of genetic information as a guide. In fact, genomics has spawned a new set of fields that end in “-omics,” denoting the fact that they attempt to study the complete set of particular molecules in an organism or cell type. Proteomics is the study of the complete set of proteins in a cell type, while metabolomics is the study of the complete set of metabolic reactions in a cell.
New approaches to science call for new tools as well. The tools on which geneticists have relied over the years have largely proved to be insufficient for studying entire genomes. Bioinformatics is a subdiscipline of computational biology that has arisen to provide such tools. Bioinformatics includes the computational methods required to find patterns in the huge genomic databases that have been produced, to track the expression of genes using DNA microarrays, to identify all the proteins in a cell, and to model the protein interactions involved in cell metabolism, among other things.
In addition to producing the “-omics” disciplines, the field of genomics can itself be subdivided. Although these divisions are somewhat artificial, they do help illustrate the different goals of genomics research. The main divisions of genomics are structural genomics, functional genomics, and comparative genomics.
Structural genomics is concerned with the structure of hereditary information. The determination of the number, location, and order of genes on a particular chromosome is one pursuit of this field. While bacterial genomes are typically contained within a single circular chromosome, humans have twenty-three pairs of chromosomes and some organisms have even more. Studying regions of DNA between genes, or intergenic regions, is also the realm of structural genomics. Intergenic regions are often composed of a highly repetitive DNA sequence that does not code for any type of protein. In the early twenty-first century, the precise function of these regions still eluded scientists. Although noncoding sequence is relatively rare in bacteria, it makes up a major portion of many multicellular organisms, including about 98 percent of the human genome. A separate goal of structural biology is the determination of the three-dimensional structure of all the proteins encoded by a genome, an endeavor called structural proteomics.
Functional genomics is less concerned with the structure of a genome and more concerned with its function. This division of genomics tends to be of more interest to pharmaceutical companies and the medical community as a whole. Functional genomics asks questions such as, “What do the products of individual genes do?” and “How does the perturbation of gene function lead to disease states?” Determining the function of a gene, however, is not as straightforward as it might appear. By the early twenty-first century, determining the structure of a given genome was the relatively easy part, but the function of up to half the genes in a typical sequenced genome remained undetermined. Even the determination of the three-dimensional structure of a protein that is produced by a specific gene does not guarantee that its function can be discerned, but scientists are always hopeful that this will lead to conjecture concerning its function.
Another clue concerning gene function can be derived by the determination of where and when a particular gene is expressed (activated to produce its corresponding protein). While gene expression has traditionally been monitored one gene at a time using molecular genetic techniques, around the turn of the twenty-first century researchers began experimenting with DNA microarrays, or “gene chips,” which include copies of many, if not all, genes from a given genome attached to a solid support such as a glass slide. The expression of these genes can be monitored at any given time by binding fluorescently tagged sequences of DNA that are complementary to the genes in question. An automated scanner then measures the fluorescence of each spot and records the data into a computer.
The final division of genomics, comparative genomics, encompasses the goals of the other two divisions but achieves these goals by making comparisons between two or more different genomes. For example, the structure of a given genome, by itself, may not appear significant until the same basic structure is detected in another species. Regions of chromosomes from two different species that are similar in structure are said to display synteny. Comparative genomics as a discipline was not possible until the mid-1990s, when computing technology had developed to the point where huge databases of genomic information could be stored and compared quickly and accurately. Comparative genomics has also aided in the quest to determine gene function. One of the most common techniques of determining gene function is by looking for orthologues in related species. Orthologues are genes from different species that are thought to be related by evolution; they often encode similar, but not identical, proteins with related functions.
Genomics can trace its origins to the development of techniques used to determine the sequence of DNA. In 1977, British biochemist Frederick Sanger and colleagues published a sequencing method based on the principle of chain termination. In this method, the sequence of target DNA is determined by enzymatically producing a complementary strand of DNA. In Sanger sequencing, as it is now called, a molecular “poison” is included in a given reaction mixture so that the newly synthesized complementary chain is terminated at specific bases. Sanger’s method was then modified in the 1990s to include fluorescent dyes on the chain-terminating bases so that the DNA sequence could be read using a scanner and recorded directly into a computer. Some have claimed that Sanger and colleagues were actually the first group to sequence a genome, since they published the sequence of a viral genome in the same year that they described their revolutionary technique. Viruses, however, are not free-living organisms, and their genomes are thousands of times smaller than the typical bacterial genome.
The Human Genome Project was first proposed in 1986 and was funded two years later at an expected cost of three billion dollars. The project officially got under way in 1990 as sequencing began in earnest on some of the smaller model genomes. In 1995, as some of these sequencing efforts were nearing completion, American pharmacologist Craig Venter and his colleagues at a private not-for-profit institute, the Institute for Genome Research, published the genome sequence of the bacterium Haemophilus influenzae, the first free-living organism to have its genome sequenced.
While the public consortium had been working on sequences using established techniques, Venter and colleagues had developed a faster technique for determining the sequence of whole genomes. While this technique still used the basic Sanger-style chain termination procedure, it simplified an earlier step in the process in which large numbers of clones of genomic fragments were made before sequencing could begin. Venter had circumvented this cloning step; he called his approach whole-genome shotgun sequencing. During the next two years, the public consortium published the sequences of yeast and E. coli, respectively, and in 1998 announced that the sequence of C. elegans was complete. That same year, Venter announced that he was starting a for-profit company, Celera Genomics, which would complete the human genome within three years using shotgun sequencing. Up until this time, however, Venter had only demonstrated this approach using bacterial genomes. To demonstrate the validity of the shotgun approach on large genomes, and to gear up for sequencing the human genome, Celera sequenced the 170 million bp genome of the fruit fly in 2000, at that time the largest genome ever sequenced.
During the final years of the twentieth century, spurred on by the competition from the private sector, the public consortium had redoubled its efforts on the human genome. In February, 2001, the race to sequence the human genome ended in a tie. Both sequencing efforts, public and private, published their draft sequence of the human genome at this time, and in April 2003, the two efforts together announced the final completed sequence. The mouse genome sequence was also published in 2003. In fact, by mid-2003, about 150 genomic sequences had been determined (the vast majority of which were bacterial genomes) and almost 600 more were underway, including many more multicellular organisms.
Are the time, effort, and money that have been spent on various genome-sequencing projects really worth it? One promise that genomics may hold for the future is the identification of all human disease genes. While this has been one of the main justifications for the Human Genome Project, one should keep in mind that identifying the gene that causes a particular disease is not always equivalent to finding a cure for that disease. Another potential benefit of genomic research is the development of better treatments for bacterial and parasitic infections. A number of disease-causing bacteria have already been the subject of genome sequencing efforts, including the causative agents of bubonic plague, anthrax, and tuberculosis, to name a few. Some indirect benefits of genomics (which may, in time, prove just as valuable) include a better understanding of evolutionary relationships between species as well as a firmer grasp on basic cellular function. In all, the field of genomics promises to be a powerful means of scientific inquiry well into the future.
Brown, Terence A. Genomes. 3rd ed. New York: Garland Science, 2007.
Campbell, A. Malcolm, and Laurie J. Heyer. Discovering Genomics, Proteomics, and Bioinformatics. 2nd ed. San Francisco: CSHL Press, 2009.
Centers for Disease Control. "Genomics & Health Impact Update: August 6, 2013." CDC Public Health Genomics, August 1–August 8, 2013.
Clark, M. S. “Comparative Genomics: The Key to Understanding the Human Genome Project.” BioEssays 21 (1999): 121–130.
Collins, Francis S., et al. “A Vision for the Future of Genomics Research.” Nature 422 (April, 2003): 835–847.
DeRisi, Joseph L., and Vishwanath R. Iyer. “Genomics and Array Technology.” Current Opinion in Oncology 11 (1999): 76–79.
Goodman, Denise M., et al. "Genomic Medicine." JAMA: The Journal of the American Medical Association, April 10, 2013.
Klug, William S., and Michael R. Cummings. “Genomics, Bioinformatics, and Proteomics.” In Concepts of Genetics. 8th ed. Upper Saddle River, N.J.: Prentice Hall, 2007.
Olson, Steve, and Institute of Medicine (US). Integrating Large-Scale Genomic Information into Clinical Practice: Workshop Summary. Washington, D.C.: National Academies Press, 2012.
Snustad, D. Peter, and Michael J. Simmons. “Genomics.” In Principles of Genetics. 5th ed. Hoboken, N.J.: John Wiley & Sons, 2009.
Urbano, Kevin V. Advances in Genetics Research. New York: Nova Science, 2011.
Wei, Liping, et al. “Comparative Genomics Approaches to Study Organism Similarities and Differences.” Journal of Biomedical Informatics 35 (2002): 142–150.