This article documents motif sequence searching capabilities in the STNext databases CAS REGISTRY, GENESEQ, PATGENE, and USGENE.
To initiate sequence code match in CAS REGISTRY, use SEARCH or S, e.g., S DSDGP/SQEP. To initiate sequence code match in GENESEQ, PATGENE and, USGENE, use RUN GETSEQ, e.g., RUN GETSEQ DSDGP/SQEP.
For more documentation on Motif sequence search capabilities, see the STNext help file.
CAS STNext Motif Symbols and Characters
Use this symbol: | When you want to: | Examples | Sample Answers |
^ | Search at the beginning or the end of a sequence | => S ^MCGIL/SQSP => S VCDS^/SQSFP |
“MCGIL……” "………VCDS" |
[ ] | Specify alternate residues | => S LGP[VL]/SQSP | LGPV LGPL |
[-] or [~] | Exclude one or more residues |
=> S PTGK[-]/SQSP => S PTGK[~H]/SQSP |
PTGKACCD |
{#,#} {# - #} {#} |
Repeat preceding residue(s) |
=> S G(FL)(1,3}/SQSP => S GG(FL){1-3}/SQSP => S GG(FL){3}/SQSP |
GGFL GGFLFL GGFLFLF |
. | Specify gap(s) in the sequence |
=> S SY.RPG/SQSP => S SY...RPG/SQSP |
SYARPG SYAAARPG |
| | Specify alternate residues |
=> S ACD|KLM/SQSP => S A(CD|KL)M/SQSP |
ACD KLM ACDM AKLM |
? | Repeat residue(s) zero or one time | => S FLRR(RP)?K/SQSP | FLRRK FLRRRPK |
* | Repeat residue(s) zero or more times | => S KLK(WD)*N/SQSP | KLKN KLKWDN KLKWDWDN |
Use this symbol: | When you want to: | Examples | Sample Answers |
+ | Repeat residue(s) one or more times |
=> S AQP+/SQSP => S (AQP)+/SQSP |
AQPP AQPPP AQPPPP AQPAQP AQPAQPAQP AQPAQPAQPAQP |
& | Join multiple sequence fragments represented as L#’s together as one (note: CAS Registry only) |
=> S KLKWD/SQSP => S ..KRYG/SQSP => S L1 & L2/SQSP |
KLKWDKRYG KLKWNQDKRYG |
CAS STNext Sequence Code Match Types
Search Type | Amino Acids | Nucleic Acids | Examples |
EXACT | /SQEP | /SQEN | => S DSDGP/SQEP => S GGAATT/SQEN |
EXACT | /SQEFP | ---------- | => S DSDGP/SQEFP |
SUBSEQUENCE | /SQSP | /SQSN | => S DSDGP/SQSP => S GGAATT/SQSN |
SUBSEQUENCE FAMILY | /SQSFP | ---------- | => S DSDGP/SQSF |
CAS STNext Amino Acid Family Substitution Definitions
In CAS Registry, GENESEQ, PATGENE, and USGENE searches in the Subsequence Family Search of Proteins (/SQSFP) retrieve exact sequences, subsequences, and answers in which family-equivalent substitution of the query amino acids occurs. For example, the query ADHIFC/SQSFP retrieves the equivalent fragment “PQKLYC”. The table below documents the family-equivalent amino acids.
Groups | Amino Acids |
Neutral Weak Hydrophobic | Alanine (Ala, A) Glycine (Gly, G) Proline (Pro, P) Serine (Ser, S) Threonine (Thr, T) |
Acid-Amines Hydrophilic | Aspartic Acid (Asp, D) Asparagine (Asn, N) Glutamic Acid (Glu, E) Glutamine (Gln, Q) |
Basic Hydrophobic | Arginine (Arg, R) Histidine (His, H) Lysine (Lys, K) |
Hydrophobic | Isoleucine (Ile, I) Leucine (Leu, L) Methionine (Met, M) Valine (Val, V) |
Aromatics | Phenylalanine (Phe, F) Tryptophan (Trp, W) Tyrosine (Tyr,Y) |
Cross Linking | Cysteine (Cys, C) |