If the amino acid sequences in the proteins of two organisms are similar, why will their DNA also be similar?
Amino acid sequences are determined by nucleotide sequences in DNA, through their transcribed expression in mRNA and its subsequent translation, codon after codon, determined by what is called the Genetic Code. In the examples that follow, similar will be understood as having a certain degree of colinearity. To understand this, compare:
You will promptly recognise colinearity of the first 6 characters. Comparisons between sequences (be it of amino acids or nucleotides) can only be made if they are aligned together. Identity can be continuous or not. In this case, you might ask also whether the first A in each sequence is "conserved" or a coincidence. Any biologist would tend to argue for coincidence, but this is not always the case. It's a tough decision (see final part of this answer).
So, at least the first 6 "amino acids" are "similar" because they can be aligned. Let us now use the one-letter code for each amino acid (clarifying that PRINCE = Pro-Arg-Ile-Asn-Cys-Glu). Reverse-translating this, using the standard genetic code, will give you
CCNCGNAUHAAYUGYGAR (or CCNAGRAUHAAYUGYGAR)
(in RNA sequences,
N = A/G/C/U, H = A/C/U, Y = C/U, R = A/G)
thus realizing that, to produce PRINCE, some positions in the mRNA (CC..G.AU.AA.UG.GA.) are invariable, and others variable. The code is said to be degenerate, in the sense that most amino acids are specified by more than one codon (only two exceptions in the standard code: methionine and tryptophan). But for a given amino acid all codons are related to some point. This is why the DNA would have to conserve at least those positions unchanged in order to specify the conserved tract.
Now consider another sequence
Quite easily you will recognise that some degree of colinearity remains with PRINCECHARLES, and note that .R.NC.CHAR-LES- are shared between the two. It would be a good exercise to verify, considering only the CGN for Arg, CTN for Leu and TCN for Ser, that the mRNA sequences are, respectively,
The characters in bold represent nucleotides that must be identical between both DNAs because they code for an identical amino acid (others are not emphasised, because they are identical for other reasons).
These examples should answer the question.
Now for just a final point: why did I choose "readable" amino acid sequences for this example? It can be a fun game to play, but the real reason is that, in all this comparison work, sequences are information. Bioinformatics is a discipline that struggles to "make sense" of amino acid sequences by defining what are called protein domains, that is, segments of the polypeptides to which a function can be assigned. So, in the first pair of examples, PRINCE and PRINCESS are obviously related, not only in structure but semantically. Indeed two different proteins can share a related domain and be otherwise unrelated. If one has reason to decide that the common A between CHARLES and DIANA is a remnant of shared function, then its DNA sequence counterpart is said to be conserved phylogenetically (at least the constant GC in the codons that specify alanine). Of course this decision is not arbitrary and relies on many comparisons and consensual evolutionary interpretations.
The DNA will be similar because DNA acts as a direct blueprint for the proteins. The sequence of bases in DNA codes, in sets of 3, for specific amino acids. The code is read from the DNA by mRNA (messenger RNA); that is then translated by tRNA into amino acids. The order of the amino acids determines how it will fold in on itself, what shape/function it will have and therefore what protein it is. So, except when mutations happen, if the DNA sequence is the same, the proteins will be the same.