Protein motif scanner
Scan protein sequences for glycosylation, phosphorylation, localization signals, and other biologically important motifs.
A protein motif scanner searches amino acid sequences for short, conserved patterns that are associated with specific biological functions. These motifs often correspond to post-translational modification sites, subcellular localization signals, or structural features that are critical for protein activity and regulation.
Motif scanning is a common first step in functional annotation. By identifying known patterns within an uncharacterized sequence, researchers can generate hypotheses about where a protein localizes, how it is modified, and what regulatory networks it participates in. This tool provides a code-free alternative to tools like ScanProsite for rapid pattern-based sequence annotation.
The scanner applies a library of regular expression patterns against each input sequence. Each pattern corresponds to a well-characterized biological motif defined by consensus sequences from the literature. When a match is found, the tool records the motif type, position, matched residues, and surrounding sequence context.
Motif patterns are expressed as sequence rules. For example, the N-glycosylation consensus is N-X-S/T, where X is any amino acid except proline. The scanner converts these biochemical rules into regular expressions and executes them against the full length of each protein sequence.
All matches are reported regardless of overlap. A single residue can participate in multiple motif matches if it satisfies the consensus for more than one pattern.
The scanner searches for motifs across several functional categories.
| Motif | Pattern | Description |
|---|---|---|
| N-glycosylation | N-X-[ST] (X is not P) | N-linked glycosylation sequon |
| GAG attachment | SG-X-G | Glycosaminoglycan attachment site |
N-linked glycosylation is one of the most common post-translational modifications, affecting protein folding, stability, and cell-surface recognition. The N-X-S/T sequon is necessary but not sufficient for glycosylation; not all sites matching this pattern are actually modified.
| Motif | Pattern | Description |
|---|---|---|
| PKA/PKG | [RK]-[RK]-X-S | Protein kinase A/G substrate |
| CK2 | S-X-X-[DE] | Casein kinase II substrate |
| PKC | [ST]-X-[RK] | Protein kinase C substrate |
Phosphorylation is a reversible modification that acts as a molecular switch in signaling pathways. Each kinase family recognizes a distinct consensus sequence around the target serine, threonine, or tyrosine. The patterns detected here represent simplified consensus motifs and should be considered candidate sites rather than confirmed phosphorylation events.
| Motif | Pattern | Description |
|---|---|---|
| N-myristoylation | G-X-X-X-[STAGCN]-X (with exclusions) | N-terminal myristoylation site |
| CAAX prenylation | C-X-X-[ASLIMVT] (C-terminal) | Prenylation box at protein C-terminus |
Lipid modifications anchor proteins to membranes. Myristoylation occurs at N-terminal glycine residues, while prenylation targets the cysteine of a C-terminal CAAX motif. The CAAX pattern is only searched at the end of the sequence since it must be C-terminal to be functional.
| Motif | Pattern | Description |
|---|---|---|
| Nuclear localization (monopartite) | 4+ consecutive [KR] | Monopartite nuclear localization signal |
| ER retention | [KH]-D-E-L (C-terminal) | ER retention/retrieval signal |
Nuclear localization signals (NLS) direct proteins to the nucleus through importin-mediated transport. The classical monopartite NLS consists of a stretch of basic residues (lysine and arginine). ER retention signals like KDEL ensure that resident ER proteins are retrieved from the Golgi apparatus back to the ER.
| Motif | Pattern | Description |
|---|---|---|
| SUMOylation | [VILMAFP]-K-X-E | SUMOylation consensus |
| Tyrosine sulfation | [ED]-X-X-X-Y | Tyrosine sulfation site |
| Amidation | X-[RK]-G-[RK] | C-terminal amidation processing site |
| Motif | Pattern | Description |
|---|---|---|
| RGD | R-G-D | Integrin-binding cell attachment sequence |
The RGD tripeptide is the minimal recognition sequence for many integrin receptors. It mediates cell attachment in extracellular matrix proteins like fibronectin and vitronectin.
The output table shows one row per input sequence with the following columns:
A high number of total hits does not necessarily indicate biological significance. Short consensus patterns like CK2 phosphorylation (S-X-X-[D/E]) will match frequently in any protein simply by chance. Longer and more specific patterns like KDEL are more likely to represent true functional sites when found.
Context matters: a CAAX box match is only meaningful if it occurs at the C-terminus. An NLS is most significant in a protein that is known or expected to be nuclear. Cross-reference motif hits with experimental data or conservation analysis when possible.
Each hit includes the surrounding amino acid context (5 residues on each side of the match) to help evaluate whether the site is in a plausible structural context. Motifs buried in hydrophobic cores or within secondary structure elements may be less accessible for modification.
.fasta, .fa, .fas, .pdb, and .txt file uploadsThe motifs detected by this tool are based on simplified consensus patterns. Not every sequence match corresponds to a biologically active site. Key limitations include:
For higher-confidence predictions, consider combining motif scanning results with conservation analysis, structural data, or specialized prediction tools for specific modification types.