Sequence Motif

Searches protein and nucleic acid sequences that match a sequence motif. A Sequence Motif can be an exact sequence or a sequence pattern expressed by regular expression syntax. Regular expressions are powerful notations for defining complex sequence patterns. Click on the sequence to run the example queries below.

Examples

  1. The sequence motif search, unlike BLAST or FASTA, allows searching for arbitrarily short sequence fragments, for example:

    NPPTP

  2. The motif search supports wildcard queries by placing an 'X' at the variable residue position. A query for SH3 domains using the consequence sequence -X-P-P-X-P (where X is a variable residue and P is Proline) can be expressed as:

    XPPXP

  3. Ranges of variable residues are specified by the {n} notation, where n is the number of variable residues. To query a motif with seven variables between residues W and G and twenty variable residues between G and L use the following notation:

    WX{7}GX{20}L

  4. Variable ranges are expressed by the {n,m} notation, where n is the minimum and m the maximum number of repetitions. For example the zinc finger motif that binds Zn in a DNA-binding domain can be expressed as:

    CX{2,4}CX{12}HX{3,5}H

  5. The '^' operator searches for sequence motifs at the beginning of a protein sequence. The following two queries find sequences with N-terminal Histidine tags

    ^HHHHHH or ^H{6}

  6. Square brackets specify alternative residues at a particular position. The Walker (P loop) motif that binds ATP or GTP can be expressed as:

    [AG]XXXXGK[ST]

    A or G are followed by 4 variable residues, then G and K, and finally S or T