GENESEQ, USGENE, PATGENE: Alignment Information

  • Updated
Download Icon Download

The basic information about the similarity between two compared sequences is given by the alignment of both. This means a direct comparison is made residue by residue between the two sequences over the area of their similarity. With the command D ALIGN the alignment is displayed with text characters, with the command D ALIGNG as an image.

Alignment Values

Above each alignment several values of the sequence search and the specific alignment are provided:

  • the L-number of the original sequence search if this L-number is used before for any kind of additional search or sorting.
  • the length of the query and the retrieved sequence
  • the query raw and bit score value of the sequence search and the percentage of the highest possible score (BLAST using bit score, GETSIM using raw score). The percentage score value can also be displayed with D SCORE and be used for sorting with SORT SCORE.
  • the expectation value for the specific alignment
  • the alignment identity, i.e., the number of exact matching characters in the alignment relative to the alignment length. The percentage identity value can also be displayed with D IDENT and used for sorting with SORT IDENT. For peptide searches a second identity value is displayed, which includes for the count the protein family matches represented by a plus sign.
  • four percentage values:
    1. query identity: number of exact matches in the alignment relative to the query length
    2. query coverage: alignment length (including mismatches) relative to the query length (maximum 100%)
    3. subject coverage: alignment length (including mismatches) relative to the subject length
    4. subject identity: number of exact matches in the alignment (identity) relative to the subject length (maximum 100%).
  • for nucleotide searches the retrieved nucleotide strand
  • for all search the length of the alignment

Peptide search with displayed values

ALIGNMENT FROM L-NUMBER L1

Query Length: 303; Sequence Length: 591;
Score: 277.2 bits (306), 50.6% of highest possible score 547.7;
Expect value: 1.877e-71;
Identities: 158 / 160 (98.8%);
Query Identity: 52.1%; Query Coverage: 52.8%;
Subject Identity: 26.7%; Subject Coverage: 27.1%;
Strand: Plus / Plus; Alignment Length: 160;

Nucleotide search with displayed values

ALIGNMENT FROM L-NUMBER L1

Query Length: 232; Sequence Length: 383;
Score: 269.6 bits (688), 58.2% of highest possible score 463.4;
Expect value: 9.408e-71;
Identities: 135 / 231 (58.4%); Positives: 177 / 231 (76.6%);
Query Identity: 58.2%; Query Coverage: 99.6%;
Subject Identity: 35.2%; Subject Coverage: 60.3%;
Alignment Length: 231;

BLAST and GETSIM Alignments of Nucleic Acid Sequences

Similarity in BLAST and GETSIM alignments are given by bars in a line between the two lines representing the query sequence (upper line) and the hit or subject sequence (lower line). A bar marks a full match between two nucleic acid residues and blanks show non-matching residues. Gaps are introduced in the query or the subject sequence for a better alignment of both sequences.

Example of an alignment of a nucleic acid sequence with D ALIGN and D ALIGNG:

STNext-GENESEQ-USGENE-PATGENE-AlignmentInformation-1.png

STNext-GENESEQ-USGENE-PATGENE-AlignmentInformation-2.png

BLAST and GETSIM Alignments of Amino Acid Sequences

A full match similarity in BLAST and GETSIM alignments of amino acid sequences are given by bars in a line between the two lines representing the query sequence (upper line) and the hit or subject sequence (lower line). A plus sign represents a amino acid family match. Blanks show non-matching residues. Gaps are introduced in the query or the subject sequence for a better alignment of both sequences.

Example of an alignment of an amino acid sequence with D ALIGN and D ALIGNG:

STNext-GENESEQ-USGENE-PATGENE-AlignmentInformation-3.png

ALIGN Display After GETSEQ Sequence Searches

The exact sequence searching with RUN GETSEQ yields direct matches which can be visualized by means of a ALIGN or ALIGNG display. This ALIGN display format after GETSEQ shows

  • the subject sequence length
  • in case of a nucleotide search the retrieved strand
  • in HITS at the residue number(s) of the start and end point(s) of the matching part(s) of the hit sequence
  • the subject sequence with the hit sequence highlighted with double underlining

This allows to use the standard display D TRIAL ALIGN after a RUN GETSEQ search in the same way as used after the similarity search options RUN GETSIM or RUN BLAST.

Example:

STNext-GENESEQ-USGENE-PATGENE-AlignmentInformation-4.png

Display of Multiple Alignments

After BLAST (not GETSIM) searches up to three alignments might be displayed if the algorithm detects several matching sequences.

Example:

STNext-GENESEQ-USGENE-PATGENE-AlignmentInformation-5.png

You may also combine the results of a BLAST- and a GETSIM-search like =>S L1 AND L2. After D ALIGN two alignments are displayed, where ALIGNMENT FROM L-NUMBER L1 is the alignment of the BLAST-search and ALIGNMENT FROM L-NUMBER L2 the alignment of the GETSIM-search.

STNext-GENESEQ-USGENE-PATGENE-AlignmentInformation-6.png

The alignments of a BLAST- and GETSIM-search can be different for the same sequence, which may provide some additional information.

STNext-GENESEQ-USGENE-PATGENE-AlignmentInformation-7.png