
Protein motif scanner
Scan protein sequences for glycosylation, phosphorylation, localization signals, and other biologically important motifs.
Related tools

Amino acid composition
Analyze amino acid composition of protein sequences. The tool accepts FASTA sequences and outputs the percentage of each amino acid in the sequence.

Aliphatic Index
Calculate the aliphatic index of protein sequences. A measure of the relative volume occupied by aliphatic side chains, indicating thermostability.

Extinction coefficient calculator
Calculate the molar extinction coefficient of protein sequences at 280 nm. Used for protein concentration determination by UV spectroscopy.

Glycosylation site finder
Find potential N-linked glycosylation sites (NX[S/T] sequons) in protein sequences. Identifies asparagine residues in the consensus motif for N-glycosylation.

GRAVY
Calculate the GRAVY (Grand Average of Hydropathy) score of protein sequences. Positive values indicate hydrophobic proteins, negative values indicate hydrophilic proteins.

Instability Index
Calculate the instability index of protein sequences. Values above 40 indicate an unstable protein with a short half-life in vitro.

Protein molecular weight calculator
Calculate the molecular weight (MW) of protein sequences in Daltons. Supports FASTA format input and batch processing.

pI Calculator
Calculate the theoretical isoelectric point (pI) of protein sequences. The pI is the pH at which a protein carries no net electrical charge.

IPC 2.0 (isoelectric point calculator)
Isoelectric Point Calculator 2.0 - Predict protein/peptide isoelectric point (pI) using 18+ validated pKa scales, SVR models, and deep learning. Supports proteins, peptides, and comprehensive analysis.

CSV to FASTA
Convert CSV and TSV files containing sequence data to FASTA format with flexible column mapping and automatic delimiter detection
What is a protein motif scanner?
A protein motif scanner searches amino acid sequences for short, conserved patterns that are associated with specific biological functions. These motifs often correspond to post-translational modification sites, subcellular localization signals, or structural features that are critical for protein activity and regulation.
Motif scanning is a common first step in functional annotation. By identifying known patterns within an uncharacterized sequence, researchers can generate hypotheses about where a protein localizes, how it is modified, and what regulatory networks it participates in. This tool provides a code-free alternative to tools like ScanProsite for rapid pattern-based sequence annotation.
How does motif scanning work?
The scanner applies a library of regular expression patterns against each input sequence. Each pattern corresponds to a well-characterized biological motif defined by consensus sequences from the literature. When a match is found, the tool records the motif type, position, matched residues, and surrounding sequence context.
Pattern matching
Motif patterns are expressed as sequence rules. For example, the N-glycosylation consensus is N-X-S/T, where X is any amino acid except proline. The scanner converts these biochemical rules into regular expressions and executes them against the full length of each protein sequence.
All matches are reported regardless of overlap. A single residue can participate in multiple motif matches if it satisfies the consensus for more than one pattern.
Supported motifs
The scanner searches for motifs across several functional categories.
Glycosylation
| Motif | Pattern | Description |
|---|---|---|
| N-glycosylation | N-X-[ST] (X is not P) | N-linked glycosylation sequon |
| GAG attachment | SG-X-G | Glycosaminoglycan attachment site |
N-linked glycosylation is one of the most common post-translational modifications, affecting protein folding, stability, and cell-surface recognition. The N-X-S/T sequon is necessary but not sufficient for glycosylation; not all sites matching this pattern are actually modified.
Phosphorylation
| Motif | Pattern | Description |
|---|---|---|
| PKA/PKG | [RK]-[RK]-X-S | Protein kinase A/G substrate |
| CK2 | S-X-X-[DE] | Casein kinase II substrate |
| PKC | [ST]-X-[RK] | Protein kinase C substrate |
Phosphorylation is a reversible modification that acts as a molecular switch in signaling pathways. Each kinase family recognizes a distinct consensus sequence around the target serine, threonine, or tyrosine. The patterns detected here represent simplified consensus motifs and should be considered candidate sites rather than confirmed phosphorylation events.
Lipid modifications
| Motif | Pattern | Description |
|---|---|---|
| N-myristoylation | G-X-X-X-[STAGCN]-X (with exclusions) | N-terminal myristoylation site |
| CAAX prenylation | C-X-X-[ASLIMVT] (C-terminal) | Prenylation box at protein C-terminus |
Lipid modifications anchor proteins to membranes. Myristoylation occurs at N-terminal glycine residues, while prenylation targets the cysteine of a C-terminal CAAX motif. The CAAX pattern is only searched at the end of the sequence since it must be C-terminal to be functional.
Localization signals
| Motif | Pattern | Description |
|---|---|---|
| Nuclear localization (monopartite) | 4+ consecutive [KR] | Monopartite nuclear localization signal |
| ER retention | [KH]-D-E-L (C-terminal) | ER retention/retrieval signal |
Nuclear localization signals (NLS) direct proteins to the nucleus through importin-mediated transport. The classical monopartite NLS consists of a stretch of basic residues (lysine and arginine). ER retention signals like KDEL ensure that resident ER proteins are retrieved from the Golgi apparatus back to the ER.
Other post-translational modifications
| Motif | Pattern | Description |
|---|---|---|
| SUMOylation | [VILMAFP]-K-X-E | SUMOylation consensus |
| Tyrosine sulfation | [ED]-X-X-X-Y | Tyrosine sulfation site |
| Amidation | X-[RK]-G-[RK] | C-terminal amidation processing site |
Cell adhesion
| Motif | Pattern | Description |
|---|---|---|
| RGD | R-G-D | Integrin-binding cell attachment sequence |
The RGD tripeptide is the minimal recognition sequence for many integrin receptors. It mediates cell attachment in extracellular matrix proteins like fibronectin and vitronectin.
Understanding the results
The output table shows one row per input sequence with the following columns:
- Protein ID: The identifier extracted from the FASTA header
- Amino Acids: Total sequence length
- Total Hits: Number of motif matches found across all categories
- Unique Motifs: Number of distinct motif types detected
- Hits Summary: A semicolon-separated list of each hit showing the motif name and position (e.g., "N-glycosylation @45; PKC phosphorylation @102")
Interpreting hit counts
A high number of total hits does not necessarily indicate biological significance. Short consensus patterns like CK2 phosphorylation (S-X-X-[D/E]) will match frequently in any protein simply by chance. Longer and more specific patterns like KDEL are more likely to represent true functional sites when found.
Context matters: a CAAX box match is only meaningful if it occurs at the C-terminus. An NLS is most significant in a protein that is known or expected to be nuclear. Cross-reference motif hits with experimental data or conservation analysis when possible.
Sequence context
Each hit includes the surrounding amino acid context (5 residues on each side of the match) to help evaluate whether the site is in a plausible structural context. Motifs buried in hydrophobic cores or within secondary structure elements may be less accessible for modification.
Input requirements
- Protein sequences in FASTA format (single or multiple)
- Plain text amino acid sequences (without headers) are also accepted
- PDB files are automatically converted to FASTA before scanning
- Supports batch processing of multiple sequences
- Accepts
.fasta,.fa,.fas,.pdb, and.txtfile uploads
Use cases
- Functional annotation: Predict post-translational modification sites, localization signals, and structural motifs in newly sequenced proteins
- Comparative analysis: Compare motif content across protein families or orthologs to identify conserved or lineage-specific regulatory sites
- Protein engineering: Identify and optionally remove or add modification sites when designing recombinant proteins
- Glycoprotein characterization: Map potential N-glycosylation sites before mass spectrometry analysis
- Signal peptide analysis: Check for localization signals that determine subcellular targeting
Limitations
The motifs detected by this tool are based on simplified consensus patterns. Not every sequence match corresponds to a biologically active site. Key limitations include:
- False positives: Short consensus sequences will match frequently by chance. Experimental validation is needed to confirm functional sites.
- False negatives: Many post-translational modifications lack strict consensus sequences or depend on structural context not captured by linear patterns.
- No structural context: The scanner treats the sequence as a linear string and cannot account for whether a site is surface-exposed or accessible to modifying enzymes.
- Simplified patterns: Real kinase specificity, for example, depends on residues beyond the core consensus and on three-dimensional substrate recognition. Dedicated phosphorylation predictors use machine learning for higher accuracy.
- No quantitative scoring: All matches are reported equally without confidence scores or statistical significance estimates.
For higher-confidence predictions, consider combining motif scanning results with conservation analysis, structural data, or specialized prediction tools for specific modification types.