Motif Sequence Searching

  • Updated
Download Icon Download

This article documents motif sequence searching capabilities in the STNext databases CAS REGISTRY, GENESEQ, PATGENE, and USGENE.

To initiate sequence code match in CAS REGISTRY, use SEARCH or S, e.g., S DSDGP/SQEP. To initiate sequence code match in GENESEQ, PATGENE and, USGENE, use RUN GETSEQ, e.g., RUN GETSEQ DSDGP/SQEP.

For more documentation on Motif sequence search capabilities, see the STNext help file.

CAS STNext Motif Symbols and Characters

Use this symbol: When you want to: Examples Sample Answers
^ Search at the beginning or the end of a sequence => S ^MCGIL/SQSP
=> S VCDS^/SQSFP
“MCGIL……”
"………VCDS"
[ ] Specify alternate residues => S LGP[VL]/SQSP LGPV
LGPL
[-] or [~] Exclude one or more
residues
=> S PTGK[-]/SQSP
=> S PTGK[~H]/SQSP
PTGKACCD
{#,#}
{# - #}
{#}
Repeat preceding
residue(s)

=> S G(FL)(1,3}/SQSP

=> S GG(FL){1-3}/SQSP

=> S GG(FL){3}/SQSP

GGFL
GGFLFL
GGFLFLF
. Specify gap(s) in the
sequence
=> S SY.RPG/SQSP
=> S SY...RPG/SQSP
SYARPG
SYAAARPG
| Specify alternate residues

=> S ACD|KLM/SQSP

=> S A(CD|KL)M/SQSP

ACD
KLM
ACDM
AKLM
? Repeat residue(s) zero or one time => S FLRR(RP)?K/SQSP FLRRK
FLRRRPK
* Repeat residue(s) zero or more times => S KLK(WD)*N/SQSP KLKN
KLKWDN
KLKWDWDN
Use this symbol: When you want to: Examples Sample Answers
+ Repeat residue(s) one or more times

=> S AQP+/SQSP

=> S (AQP)+/SQSP

AQPP
AQPPP
AQPPPP
AQPAQP
AQPAQPAQP
AQPAQPAQPAQP
& Join multiple sequence fragments represented as L#’s together as one
(note: CAS Registry only)

=> S KLKWD/SQSP
L1 702 KLKWD/SQSP

=> S ..KRYG/SQSP
L2 137 ..KRYG/SQSP

=> S L1 & L2/SQSP
L3 113 (KLKWD)(..KRYG)/SQSP

KLKWDKRYG
KLKWNQDKRYG

CAS STNext Sequence Code Match Types

Search Type Amino Acids Nucleic Acids Examples
EXACT /SQEP /SQEN => S DSDGP/SQEP
=> S GGAATT/SQEN
EXACT /SQEFP ---------- => S DSDGP/SQEFP
SUBSEQUENCE /SQSP /SQSN => S DSDGP/SQSP
=> S GGAATT/SQSN
SUBSEQUENCE FAMILY /SQSFP ---------- => S DSDGP/SQSF

CAS STNext Amino Acid Family Substitution Definitions

In CAS Registry, GENESEQ, PATGENE, and USGENE searches in the Subsequence Family Search of Proteins (/SQSFP) retrieve exact sequences, subsequences, and answers in which family-equivalent substitution of the query amino acids occurs. For example, the query ADHIFC/SQSFP retrieves the equivalent fragment “PQKLYC”. The table below documents the family-equivalent amino acids.

Groups Amino Acids
Neutral Weak Hydrophobic Alanine (Ala, A)
Glycine (Gly, G)
Proline (Pro, P)
Serine (Ser, S)
Threonine (Thr, T)
Acid-Amines Hydrophilic Aspartic Acid (Asp, D)
Asparagine (Asn, N)
Glutamic Acid (Glu, E)
Glutamine (Gln, Q)
Basic Hydrophobic Arginine (Arg, R)
Histidine (His, H)
Lysine (Lys, K)
Hydrophobic Isoleucine (Ile, I)
Leucine (Leu, L) Methionine (Met, M)
Valine (Val, V)
Aromatics Phenylalanine (Phe, F) Tryptophan (Trp, W)
Tyrosine (Tyr,Y)
Cross Linking Cysteine (Cys, C)