DNA Sequences, Unique

An increasingly important facet of forensic science is the use of techniques that detect and determine the structure of deoxyribonucleic acid (DNA). When the aim of the investigation is to identify an unknown person, the exploitation of unique portions of DNA can be very useful.

DNA contains genetic information that is unique to each organism. The entire cellular DNA of any organism, bacteria, plant, virus, or animal represents the genome. A DNA sequence is considered to be unique if it is present in only one copy in a haploid genome (that portion of DNA that contains only a single copy of each chromosome). In humans, for example, a haploid number of chromosomes is 23.

Not all of the DNA contained in the genome is unique; there are also various repetitive sequences present.

A DNA strand is composed of a strand of nucleotides (nitrogen-based building blocks of DNA and ribonucleic acid; RNA). Each nucleotide contains a phosphate attached to a sugar molecule (deoxyribose) and one of four bases, guanine (G), cytosine (C), adenine (A), or thymine (T).

It is the arrangement of the bases in a sequence, for example ATTGCCAT, that determines the encoded gene. This sequence allows scientists to identify organisms, genes, or fragments of genes.

One of the main characteristics of DNA is the fact that it forms double stranded molecules (helices) by forming hydrogen bonds between the complementary strands inside the helix and a sugar-phosphate backbone outside. This pairing is not random, A always pairs with T, and C pairs with G, therefore a sequence complementary to ATTCCGAT will be TAAGGCTA.

Genes are the sequences of encoded proteins, and, together with the surrounding regulatory sequences, are considered as unique genomic sequences, since they are present as single copies in a haploid genome. In contrast, some sequences are present in multiple copies. These represent repetitive fragments. The simplest genomes of viruses and bacteria contain mostly unique sequences with only a few repetitive regions. However, the proportion of repetitive DNA increases in higher organisms, for example sea urchins have only 38% unique sequences and human just over 50%.

Genes encoding the same protein in bacteria, plants, and humans often display similar genetic sequences and perform the same or similar function across the spectrum of organisms. Such homology between the sequences allows scientists to identify the genes in humans by using fragments of mouse or yeast genes to search for similar DNA fragments. Although most of the genes show some species-dependent differences, not all of them can be used to discriminate between organisms. Only a few genes can be used for this purpose. The two main groups are ribosomal (16S in bacteria and 18S in animals) and mitochondrial genes.

Ribosomal genes are useful for tracing evolution and relationships, especially in bacteria. However, mitochondrial genes have an advantage over the ribosomal genes as they are not encoded by the nuclear DNA, but are present as circular molecules in the cells. As such they are less likely to be degraded with time. This is advantageous for the forensic scientist, since genetic identification may be possible using bones, teeth, or tissue fragments even when death occurred a long time before.

The presence of unique DNA sequences allows forensic scientists to identify signature sequences that can be later used as probes to detect individual organisms or to detect a particular gene. Changes of even one base pair can be readily detected by most hybridization techniques and by sequencing.

Signature sequences are particularly important for diagnosis of viruses, which are the pathogens that lack ribosomal or mitochondrial genes. Their detection and identification is greatly simplified by using these sequences, as traditional methods can take up to a few weeks.

The unique DNA sequences can also be used to design primers (short DNA fragments needed to initiate DNA amplification) for polymerase chain reaction (PCR). There are sufficient differences between all the genes within one organism, as well as between organisms from different species, to ensure that the selected primers will only amplify the target sequence even if a mixture of different DNA molecules is present. This allows forensic scientists to design diagnostic and identification tests for the common pathogens and diseases and for parts of pathogen's genome.

Although everyone except for identical twins has unique DNA, the identification of an individual is not based on the sequencing of the individual's genome. Instead, analysis of mitochondrial DNA in a region of a displacement-loop (D-loop or control region) or of short tandem repeats (STRs) is used for identification purposes.

D-loop analysis is used for individual identification in forensic analysis. This is possible due to the polymorphisms of such sequences resulting from substitutions of base pairs during DNA replication process (for example, instead of A, DNA polymerase incorporates T).

The D-loop region is 1274 base pairs long and is located between the genes encoding transfer RNA (tRNA) for proline and tRNA for phenylalanine. It contains the regulatory regions of the for replication other genes.

The main method used for the identification of the changes in this region is PCR amplification and sequencing. However, new microarray approaches that analyze patterns of gene expression in miniature environments such as glass slides or silicon wafers are also being developed.

SEE ALSO Biodetectors; DNA profiling; RFLP (restriction fragment length polymorphism); STR (short tandem repeat) analysis.