Sequence Motif
Searches protein and nucleic acid sequences that match a sequence motif. A Sequence Motif can be an exact sequence or a sequence pattern expressed by regular expression syntax. Regular expressions are powerful notations for defining complex sequence patterns. Click on the sequence to run the example queries below.
Examples
- The sequence motif search, unlike BLAST or FASTA, allows searching for arbitrarily short sequence fragments, for example:
- The motif search supports wildcard queries by placing an 'X' at the variable residue position. A query for SH3 domains using the consequence sequence -X-P-P-X-P (where X is a variable residue and P is Proline) can be expressed as:
- Ranges of variable residues are specified by the {n} notation, where n is the number of variable residues. To query a motif with seven variables between residues W and G and twenty variable residues between G and L use the following notation:
- Variable ranges are expressed by the {n,m} notation, where n is the minimum and m the maximum number of repetitions. For example the zinc finger motif that binds Zn in a DNA-binding domain can be expressed as:
- The '^' operator searches for sequence motifs at the beginning of a protein sequence. The following two queries find sequences with N-terminal Histidine tags
- Square brackets specify alternative residues at a particular position. The Walker (P loop) motif that binds ATP or GTP can be expressed as:
A or G are followed by 4 variable residues, then G and K, and finally S or T