tranalign is a re-implementation in EMBOSS of the program mrtrans by Bill Pearson. It reads a set of (unaligned) nucleotide sequences and a corresponding set of aligned protein sequences which are the translations, and writes the coding regions to file as a nucleotide sequence alignment. The sequences must be in the same order in the input sets. Each nucleotide sequence is translated in all three forward frames using the specified genetic code and the translations compared to the corresponding protein sequence from input the alignment. The contiguous nucleotide sequence that coded the protein is written to file (it will not splice together different exons to produce a coding sequence).
The protein sequences will typically include gap (-) characters. These are ignored during sequence comparison but replaced by --- in the nucleotide sequence alignment output.
|  | 
The ID names of the nucleic acid and protein sequences are NOT checked to see if they correspond to each other. They can have any names.
There must be at least as many protein sequences as nucleic acid sequence - extra protein sequences are ignored.
Each of the nucleic acid sequences must have a corresponding protein sequence which is derived from the coding region of that nucleic acid sequence. The two sets of sequences must be in the same order.
The output is the regions of the nucleic acid sequences which code for the corresponding protein sequence, with gap characters ('-') introduced so that they have the same alignment as the corresponding protein sequences.
In general, it is better to use protein sequences for multiple alignment, but to use DNA sequences for phylogeny, for example, when using the programs dnadist, dnapars, dnaml, etc in the PHYLIP package. Where one has a protein sequence alignment, it would be time consuming to remove gap characters before back-translating the proteins. tranalign helps by generating aligned cDNA sequences from a protein sequence alignment.
tranalign finds the coding regions for contiguous sequences only. It will not splice together different exons to produce a coding sequence. You should therefore use either mRNA sequences, or nucleic sequences which you have constructed to hold a contiguous coding region (maybe using extractseq or yank and union?).
The sequences must be in the same order in both input sets of sequences. Some alignment program (including clustalw/emma) will re-order their input sequences so as to group similar sequences together.
"Guide protein sequence xxx not found in nucleic sequence xxx" - the region of the nucleic sequence which codes for the protein was not found. The coding region in the nucleic acid sequence must be a single contiguous sequence. The protein sequence might not be the corresponding one for this nucleic acid sequence if they are out of order.
tranalign was written in EMBOSS code using the description of mrtrans as a guide by