Three search options are available in GENESEQ, USGENE, and PATGENE translating the query and/or the database sequence before the search: /TSQN, /TSQP, and /TSQNX. Running one of the three search options is the same as with /SQN or /SQP.
When using TSQN or TSQP, both strands will be searched. Unlike for SQN or SQP, single or complementary strand searches cannot be specified. Thus, for a single polypeptide query all three options will cover all six possible translations, i.e., three reading frames of both the single and the complementary nucleotide sequences. For TSQNX a specification with -S SIN, -S COM or -S BOTH (default) is possible.
For all three options, the provided alignment with D ALIGN or D ALIGNG is an amino acid alignment.
A full match is given by the vertical line, a plus sign represents a protein family match, and blanks show non-matching residues. Gaps, marked with a horizontal line, are introduced in the query or the subject sequence for a better alignment of both sequences.
Q: 118 EDCLYLNVWIPAPKPKNAT--VLIWIYGGGFQTGTSSLHVYDGKFL--ARVER-- …
|||| +++ || ++ ||+||+||||+ |+ ++ ||| + + +++
S: 298 EDCLNIDIRRPAGTTADSKLPVLVWIFGGGFELGSKAM--YDGTTMVSSSIDKNM …
Stop codons are marked with an asterisk.
Q: 2 S*EQFRAPACSSEVHLLHLLLLDNFQRQQGKIRWQKVVFQGKIPTIFRDPL*LIG …
|||||||||||||||||||||||||||||||||||||||||||||| |||||||
S: 63 S*EQFRAPACSSEVHLLHLLLLDNFQRQQGKIRWQKVVFQGKIPTILRDPL*LIG …
Using the default option -f t the amino acids derived from low complexity section are displayed as lower-case letters.
Q: 2 S*EQFRAPACSSEVhllhllllDNFQRQQGKIRWQKVVFQGKIPTIFRDPL*LIG …
|||||||||||||||||||||||||||||||||||||||||||||| ||||||||
S: 63 S*EQFRAPACSSEVHLLHLLLLDNFQRQQGKIRWQKVVFQGKIPTILRDPL*LIG …
TSQN Search Option
With both homology (similarity) search options (RUN GETSIM and RUN BLAST) /TSQN is possible. Via this search a peptide query sequence can be searched against nucleotide sequences which have been translated to all potential derived protein sequences. For both search option the algorithm itself translates the protein query accordingly and uses the nucleotide sequence database. The alignment after a TSQN search shows the similarity between the query peptide sequence and the translated subject peptide sequence of the answer set. The TSQN search procedure is therefore, based on the peptide homology search algorithm, but the answers retrieved for display are the original nucleotide sequence records of the databases.
TSQN searches allows to find homologous protein coding regions in unannotated, error prone nucleotide sequences such as expressed sequence tags (ESTs; short, single-read cDNA sequences) and draft sequences (HTG; High-Throughput Genomic). Since ESTs have no annotated coding sequences, there are no corresponding protein translations available. Hence a TSQN search is the only way to find these potential coding regions at the protein level.
The example shows a searched peptide query, the alignment with the queried amino acids and the translated nucleotide, and below the original nucleotide sequence. The numbers in the alignments are from the peptide query and the retrieved nucleotide sequence.
TSQP Search Option
TSQP is only available with the RUN BLAST command only. It searches a nucleotide, translated in all six reading frames, in a protein database. The algorithm is particularly useful when the reading frame of the query sequence is unknown or it contains errors that may lead to frame shifts or other coding errors. TSQP might be the first analysis performed with a newly determined nucleotide sequence and is used in analyzing short sub-sequences of a cDNA sequence (EST sequences) and transcripts assembled from RNA sequence data. This search is more sensitive than nucleotide searches with BLAST since the sequences are compared at the protein level.
In the example is displayed a retrieved nucleotide query, the alignment with the queried translated nucleotides and the amino acids from the answer record as well as the original sequence. The numbers in the alignments are from the nucleotide query and the retrieved peptide sequence.
=> RUN BLAST ATGAACAAAACTTCCCGTACCCTGCTCTCTCTGGGCCTGCTGAGCGCGGCCATGTTCGGCGTTTCGCAACAGGCG
AATGCCCACGGTTATGTCGAATCGCCGGCCAGCCGCGCCTATCAGTGCAAACTGCAGCTCAACACGCAGTGCGGC
GCGTGCAGTACGAACCGCAGAGCGTCGAGGGCCTGAAAGGCTTCCCG/TSQP
TSQNX Search Option
The search option TSQNX is only available with the RUN BLAST command. It translates a nucleotide query sequence in all six frames, and compares those translations to the nucleotide database sequences dynamically translated in all six frames. TSQNX avoids the potential frame-shift and ambiguities that may prevent certain open reading frames from being detected. This is very useful in identifying potential proteins encoded by single pass read expressed sequence tags (EST). In addition, TSQNX searches can be helpful for identifying novel genes.
The displayed example is a nucleotide in the query and the alignment with the translated nucleotide from the query and the translated nucleotide from the original sequence. The numbers in the alignments are from the nucleotide of the query and the retrieved nucleotide sequence. In this example three results with three different reading frames from both the query and the sequence are retrieved.
=> RUN BLAST ATGAACAAAACTTCCCGTACCCTGCTCTCTCTGGGCCTGCTGAGCGCGGCCATGTTCGGCGTTTCGCAACAGGCGAATG
CCCACGGTTATGTCGAATCGCCGGCCAGCCGCGCCTATCAGTGCAAACTGCAGCTCAACACGCAGTGCGGCAGCGTGCAGTACGAACCGC
AGAGCGTCGAGGGCCTGAAAGGCTTCCCG/TSQNX