The Need for Bioinformatics (Genetics & Inherited Conditions)
While the discovery and identification of genome sequences improves understanding of biological systems, the ability to organize, categorize, and analyze these sequences has necessitated the development of important bioinformatic tools. Information derived from bioinformatics is becoming increasingly important for biological research in proteomics, microarray technology, oncology, pharmacogenomics, and other disciplines. Applications of bioinformatics include identification of the genetic contributions to an illness, which may be accomplished by cloning the gene for a particular disease. Once the contributing genes and their predisposing disease variants have been identified, diagnostic tests can be created to determine future risk.
Today, the ability to sequence cloned DNA molecules has become a routine, automated task in the modern molecular genetics laboratory, and large, publicly funded genome projects have determined the complete genomic sequences for humans, mice, fruit flies, dozens of bacteria, and many other species of interest to geneticists. All of this information is now freely available in online databases. Computational molecular biology tools allow for the design of polymerase chain reaction (PCR) primers, restriction enzyme cloning strategies, and even entire in silico experiments. This greatly accelerates the work of researchers but also changes the daily lives of many biologists so that...
(The entire section is 368 words.)
Database DesignDatabasesgenomics (Genetics & Inherited Conditions)
The DNA sequence data collected by automated sequencing equipment can be represented as a simple sequence of letters: G, A, T, and C—which stand for the four nucleotide bases on one strand of the DNA molecule (guanine, adenine, thymine, and cytosine). These letters can easily be stored as plain text files on a computer. Similarly, protein sequences can also be stored as text files using the twenty single-letter abbreviations for the amino acids.
There is a significant advantage to storing DNA and protein sequences as plain text files, also known as flat files. Text files take up minimal amounts of hard-drive space, can be used on any type of computer and operating system, and can easily be moved across the Internet. However, a text file with a bunch of letters representing a DNA or protein sequence is essentially meaningless without some basic descriptive information, such as the organism from which it comes, its location on the genome, the person or organization that produced the sequence, and a unique identification number (accession number) so that it can be referenced in scientific literature. This additional annotation information can also be stored as text—even in the same file with the sequence information—but there must be a consistent format, a standard.
In addition to maintaining basic flat-file structures for text data, it is useful to maintain sequence data in relational databases,...
(The entire section is 444 words.)
Key AlgorithmsAlgorithms (Genetics & Inherited Conditions)
Some of the key algorithms used in bioinformatics include sequence alignment (dynamic programming), sequence similarity (word matching from hash tables), assembly of overlapping fragments, clustering (hierarchical, self-organizing maps, principal components, and the like), pattern recognition, and protein three-dimensonal structure prediction. Bioinformatics is both eclectic and pragmatic: Algorithms are adopted from many different disciplines, including linguistics, statistics, artificial intelligence and machine learning, remote sensing, and information theory. There is no consistent set of theoretical rules at the core of bioinformatics; it is simply a collection of whatever algorithms and data structures have been found to work for the current data-management problems being faced by biologists. As new types of data become important in the work of molecular geneticists, new algorithms for bioinformatics will be invented or adopted.
(The entire section is 126 words.)
New Types of Data (Genetics & Inherited Conditions)
In addition to DNA and protein sequences, bioinformatics is being called upon to organize many other types of biological information that are being collected in ever greater amounts. Gene expression microarrays collect information on the amounts of mRNA produced from tens of thousands of different genes in a single tissue sample. Researchers realized that the technique of microarray analysis may identify new subclasses in disease states and establish biologic markers (biomarkers) that may be associated with diseases such as cancer. Studies are underway to examine how patients will respond to therapy when normal clinical predictors are inadequate. DNA microarrays or biochips can now be used to measure the functions of genes and proteins. DNA microarrays are microscopic slides containing cDNA (oligonucleotide) samples, which are fluorescently labeled probes used to quantitatively monitor quantities of transcripts (or mRNAs). Laser scanners are then used on the arrays to translate fluorescent emission into a numerical matrix of expression profiles. A number of clinical trials are exploring the use of microarrays for prognosis or therapeutic guidance, and pharmaceutical firms have begun using microarray data to determine the success of their clinical trials for new drugs. Additionally, the technology is finding application in both forensics and food science.
Proteomics technologies are automating the process of mass...
(The entire section is 351 words.)
Integration (Genetics & Inherited Conditions)
In order to solve many biological problems, data from a variety of sources must be combined. Thus, despite advances in bioinformatics, a large challenge facing the discipline is the integration of various types of data in a form that allows scientists to extract meaningful insights into biology from the masses of information in molecular genetic databases. Successfully using multiple data sources remains complicated, however, and a lack of file formats and standardization is probably one reason why. These difficulties have prompted the development of the European Molecular Biology Open Software Suite (EMBOSS), which is software for multipurpose sequences analysis. EMBOSS automatically copes with data in a variety of formats, which has alleviated some of the challenges.
Genome browsers are yet another challenge. For example, it is extremely difficult to provide a display that allows someone to view all the relevant information about a gene or a chromosomal region, including the identity of encoded proteins; protein structure and functional information; involvement in metabolic and regulatory pathways; developmental and tissue-specific gene expression; evolutionary relationships to proteins in other organisms; DNA motifs bound by regulatory proteins; genetic synteny with other species (that is, having genes with loci on the same chromosome); phenotypes of mutations; and known alleles and SNPs and their frequency in various...
(The entire section is 580 words.)
Further Reading (Genetics & Inherited Conditions)
Baxevanis, Andreas D., and B. F. Francis Ouellette. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. 2d ed. Hoboken, N.J.: John Wiley & Sons, 2003. This book provides a sound foundation of basic concepts of bioinformatics, with practical discussions and comparisons of both computational tools and databases relevant to biological research. The standard text for most graduate-level bioinformatics courses.
Bujnicki, J. Practical Bioinformatics: Nucleic Acids and Molecular Biology. New York: Springer Verlag, 2005. Bridges the gap between bioinformatics and molecular biology and provides numerous practical examples of the discipline that have lead to scientific advances.
Claverie, Jean-Michel, and Cedric Notredame. Bioinformatics for Dummies. Hoboken, N.J.: John Wiley & Sons, 2003. A practical introduction to bioinformatics: computer technologies that biochemical and pharmaceutical researchers use to analyze genetic and biological data. This reference addresses common biological questions, problems, and projects while providing a UNIX/Linux overview and tips on tweaking bioinformatic applications using Perl.
Krawetz, Stephen A., and David D. Womble. Introduction to Bioinformatics: A Theoretical and Practical Approach. Totowa, N.J.: Humana Press, 2003. Aimed at undergraduates, graduate students, and researchers. Includes four sections:...
(The entire section is 337 words.)
Web Sites of Interest (Genetics & Inherited Conditions)
Bioinformatics Organization. http://www.bioinformatics.org. Provides a helpful tutorial on bioinformatics.
European Bioinformatics Institute. http://www.ebi.ac.uk. Maintains databases concerning nucleic acids, protein sequences, and macromolecular structures, as well as postings of news and events and descriptions of ongoing scientific projects.
Human Genome Project Information: Bioinformatics. http://www.ornl.gov/sci/techresources/Human_Genome/research/informatics.shtml. Details Human Genome Project bioinformatics research.
International Society of Intelligent Biological Medicine. http://www.isibm.org. Promotes research that is to be conducted toward the improvement of human health.
National Center for Biotechnology Information BLAST. http://blast.ncbi.nlm.nih.gov/Blast.cgi. Provides easy access to the most widely used sequence analysis searching.
(The entire section is 114 words.)
Introduction (Magill’s Medical Guide, Sixth Edition)
The answer to the question “What is bioinformatics?” is not straightforward, yet in addressing this question the richness and extent of the field become clear. Part of the reason that it is difficult to give a concise definition of bioinformatics is that, as researchers publishing in the field realize, the definition is somewhat artificial and its boundaries are still expanding. This is not surprising, as bioinformatics might also be called mathematical/computational molecular biology, which points to large parts of biology taking on the aspects of a “hard” science such as physics or chemistry.
The creation of bioinformatics was triggered by a combination of factors in the 1990’s. Key elements were progress in computing power, the existence of much larger data sets, and increasingly quantitative approaches to molecular biology, including molecular evolutionary studies. The large data sets came from a number of sources, including long individual DNA sequences (for example, genomes), large between-species comparative or evolutionary alignments, microarray-generated gene expression data, proteomics data from two-dimensional gel electrophoresis and mass spectroscopy techniques, and structural information—broadly speaking, the fields of comparative, functional, and structural genomics. It was also increasingly recognized that quantitative molecular biology required vast amounts of computer power not only to assemble...
(The entire section is 258 words.)
The Scope of Research (Magill’s Medical Guide, Sixth Edition)
Bioinformatics itself touches on other areas of science such as biomedical informatics, computer science, statistical analysis, molecular biology, and mathematical modeling. In turn, each of these fields contributes uniquely to the progress of bioinformatics toward a mature science. Equally definitive of bioinformatics is recognizing those areas wholly or partly subsumed by an approach mixing computing power with mathematical and statistical modeling to solve biological questions based on molecular data. These areas include genomics, evolutionary biology, population genetics, structural biology, microarray gene expression analysis, proteomics, and the modeling of cellular processes plus systems biology (for example, modeling a neurological pathway in which individual neurons respond to molecular events).
In bioinformatics, as in chemistry and physics, there is a fundamental split between empirical/experimental and theoretical science. At one extreme may be a laboratory focusing on generating large amounts of microarray data with relatively little analysis, and at the other extreme may be a mathematician working alone to solve a theorem with an application to better analyze that microarray data. It is clear that both approaches are needed for science to develop. However, it is not uncommon to find researchers actively tackling both problems (for example, gathering large data sets and seeking better methods to analyze...
(The entire section is 574 words.)
Perspective and Prospects (Magill’s Medical Guide, Sixth Edition)
The implications of bioinformatics for medicine are enormous. The strictly informatics side is already central to medical genetics. Databases of human characteristics, including detailed medical histories and biochemical profiles, are matched up with millions of genetic markers within each individual. Only through such enormous databases can statistical sleuths uncover the basis of most diseases that are caused by multiple genes. This is the population genetics of humans on a vast scale. Elsewhere, medical research such as cancer modeling is rapidly becoming a branch of bioinformatics, driven by the fact that cancer is caused by many interacting genes.
The overall prospect is that bioinformatics will make possible a different sort of medicine in the twenty-first century in which fundamental research leads to pharmaceutical intervention, which leads to treating a disease at its root cause in a way that avoids the need for surgical intervention. Treatments of tomorrow, from diagnosis to cure, will involve processing large amounts of data via computers, with doctors remaining the key to ensuring an appropriate treatment regime with the consent and comfort of the patient foremost.
In short, one answer to the question “What is bioinformatics?” is the development of virtual molecular biology. As time passes, this scientific endeavor will propagate upward and outward to meet other major areas of biology, such as...
(The entire section is 255 words.)
For Further Information: (Magill’s Medical Guide, Sixth Edition)
Campbell, A. Malcolm, and Laurie J. Heyer. Discovering Genomics, Proteomics, and Bioinformatics. 2d ed. San Francisco: Pearson/Benjamin Cummings, 2007. An introductory book that gives basics in an accessible manner.
Davidson, Eric H. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution. Boston: Academic Press/Elsevier, 2006. One of the areas where bioinformatics has a huge future, in elucidating gene regulatory networks.
International Human Genome Sequencing Consortium. “Initial Sequencing and Analysis of the Human Genome.” Nature 409, no. 6822 (2001): 860-921. Example of a major bioinformatics collaboration in the area of genomics.
(The entire section is 88 words.)