Protein motif scanner

Scan protein sequences for glycosylation, phosphorylation, localization signals, and other biologically important motifs.

Input

0/3 sequences

Output

Configure input settings, then click "Run"

What is a protein motif scanner?

A protein motif scanner searches amino acid sequences for short, conserved patterns that are associated with specific biological functions. These motifs often correspond to post-translational modification sites, subcellular localization signals, or structural features that are critical for protein activity and regulation.

Motif scanning is a common first step in functional annotation. By identifying known patterns within an uncharacterized sequence, researchers can generate hypotheses about where a protein localizes, how it is modified, and what regulatory networks it participates in. This tool provides a code-free alternative to tools like ScanProsite for rapid pattern-based sequence annotation.

How does motif scanning work?

The scanner applies a library of regular expression patterns against each input sequence. Each pattern corresponds to a well-characterized biological motif defined by consensus sequences from the literature. When a match is found, the tool records the motif type, position, matched residues, and surrounding sequence context.

Pattern matching

Motif patterns are expressed as sequence rules. For example, the N-glycosylation consensus is N-X-S/T, where X is any amino acid except proline. The scanner converts these biochemical rules into regular expressions and executes them against the full length of each protein sequence.

All matches are reported regardless of overlap. A single residue can participate in multiple motif matches if it satisfies the consensus for more than one pattern.

Supported motifs

The scanner searches for motifs across several functional categories.

Glycosylation

Motif	Pattern	Description
N-glycosylation	N-X-[ST] (X is not P)	N-linked glycosylation sequon
GAG attachment	SG-X-G	Glycosaminoglycan attachment site

N-linked glycosylation is one of the most common post-translational modifications, affecting protein folding, stability, and cell-surface recognition. The N-X-S/T sequon is necessary but not sufficient for glycosylation; not all sites matching this pattern are actually modified.

Phosphorylation

Motif	Pattern	Description
PKA/PKG	[RK]-[RK]-X-S	Protein kinase A/G substrate
CK2	S-X-X-[DE]	Casein kinase II substrate
PKC	[ST]-X-[RK]	Protein kinase C substrate

Phosphorylation is a reversible modification that acts as a molecular switch in signaling pathways. Each kinase family recognizes a distinct consensus sequence around the target serine, threonine, or tyrosine. The patterns detected here represent simplified consensus motifs and should be considered candidate sites rather than confirmed phosphorylation events.

Lipid modifications

Motif	Pattern	Description
N-myristoylation	G-X-X-X-[STAGCN]-X (with exclusions)	N-terminal myristoylation site
CAAX prenylation	C-X-X-[ASLIMVT] (C-terminal)	Prenylation box at protein C-terminus

Lipid modifications anchor proteins to membranes. Myristoylation occurs at N-terminal glycine residues, while prenylation targets the cysteine of a C-terminal CAAX motif. The CAAX pattern is only searched at the end of the sequence since it must be C-terminal to be functional.

Localization signals

Motif	Pattern	Description
Nuclear localization (monopartite)	4+ consecutive [KR]	Monopartite nuclear localization signal
ER retention	[KH]-D-E-L (C-terminal)	ER retention/retrieval signal

Nuclear localization signals (NLS) direct proteins to the nucleus through importin-mediated transport. The classical monopartite NLS consists of a stretch of basic residues (lysine and arginine). ER retention signals like KDEL ensure that resident ER proteins are retrieved from the Golgi apparatus back to the ER.

Motif	Pattern	Description
SUMOylation	[VILMAFP]-K-X-E	SUMOylation consensus
Tyrosine sulfation	[ED]-X-X-X-Y	Tyrosine sulfation site
Amidation	X-[RK]-G-[RK]	C-terminal amidation processing site

Cell adhesion

Motif	Pattern	Description
RGD	R-G-D	Integrin-binding cell attachment sequence

The RGD tripeptide is the minimal recognition sequence for many integrin receptors. It mediates cell attachment in extracellular matrix proteins like fibronectin and vitronectin.

Understanding the results

The output table shows one row per input sequence with the following columns:

Protein ID: The identifier extracted from the FASTA header
Amino Acids: Total sequence length
Total Hits: Number of motif matches found across all categories
Unique Motifs: Number of distinct motif types detected
Hits Summary: A semicolon-separated list of each hit showing the motif name and position (e.g., "N-glycosylation @45; PKC phosphorylation @102")

Interpreting hit counts

A high number of total hits does not necessarily indicate biological significance. Short consensus patterns like CK2 phosphorylation (S-X-X-[D/E]) will match frequently in any protein simply by chance. Longer and more specific patterns like KDEL are more likely to represent true functional sites when found.

Context matters: a CAAX box match is only meaningful if it occurs at the C-terminus. An NLS is most significant in a protein that is known or expected to be nuclear. Cross-reference motif hits with experimental data or conservation analysis when possible.

Sequence context

Each hit includes the surrounding amino acid context (5 residues on each side of the match) to help evaluate whether the site is in a plausible structural context. Motifs buried in hydrophobic cores or within secondary structure elements may be less accessible for modification.

Input requirements

Protein sequences in FASTA format (single or multiple)
Plain text amino acid sequences (without headers) are also accepted
PDB files are automatically converted to FASTA before scanning
Supports batch processing of multiple sequences
Accepts .fasta, .fa, .fas, .pdb, and .txt file uploads

Use cases

Functional annotation: Predict post-translational modification sites, localization signals, and structural motifs in newly sequenced proteins
Comparative analysis: Compare motif content across protein families or orthologs to identify conserved or lineage-specific regulatory sites
Protein engineering: Identify and optionally remove or add modification sites when designing recombinant proteins
Glycoprotein characterization: Map potential N-glycosylation sites before mass spectrometry analysis
Signal peptide analysis: Check for localization signals that determine subcellular targeting

Limitations

The motifs detected by this tool are based on simplified consensus patterns. Not every sequence match corresponds to a biologically active site. Key limitations include:

False positives: Short consensus sequences will match frequently by chance. Experimental validation is needed to confirm functional sites.
False negatives: Many post-translational modifications lack strict consensus sequences or depend on structural context not captured by linear patterns.
No structural context: The scanner treats the sequence as a linear string and cannot account for whether a site is surface-exposed or accessible to modifying enzymes.
Simplified patterns: Real kinase specificity, for example, depends on residues beyond the core consensus and on three-dimensional substrate recognition. Dedicated phosphorylation predictors use machine learning for higher accuracy.
No quantitative scoring: All matches are reported equally without confidence scores or statistical significance estimates.

For higher-confidence predictions, consider combining motif scanning results with conservation analysis, structural data, or specialized prediction tools for specific modification types.

Amino Acid Composition -- Analyze the overall residue composition of your protein
Protein Parameters -- Calculate molecular weight, pI, GRAVY, and other physicochemical properties
Isoelectric Point Calculator -- Predict the pI from charged residue content
Hydropathy Plot -- Visualize hydrophobic and hydrophilic regions along the sequence
HMMER -- Search for protein domains and remote homologs using profile HMMs

Protein motif scanner

Input

Output

What is a protein motif scanner?

How does motif scanning work?

Pattern matching

Supported motifs

Glycosylation

Phosphorylation

Lipid modifications

Localization signals

Other post-translational modifications

Cell adhesion

Understanding the results

Interpreting hit counts

Sequence context

Input requirements

Use cases

Limitations

Input

Output

What is a protein motif scanner?

How does motif scanning work?

Pattern matching

Supported motifs

Glycosylation

Phosphorylation

Lipid modifications

Localization signals

Other post-translational modifications

Cell adhesion

Understanding the results

Interpreting hit counts

Sequence context

Input requirements

Use cases

Limitations

Protein motif scanner

Input

Output

What is a protein motif scanner?

How does motif scanning work?

Pattern matching

Supported motifs

Glycosylation

Phosphorylation

Lipid modifications

Localization signals

Other post-translational modifications

Cell adhesion

Understanding the results

Interpreting hit counts

Sequence context

Input requirements

Use cases

Limitations

Related tools

Input

Output

What is a protein motif scanner?

How does motif scanning work?

Pattern matching

Supported motifs

Glycosylation

Phosphorylation

Lipid modifications

Localization signals

Other post-translational modifications

Cell adhesion

Understanding the results

Interpreting hit counts

Sequence context

Input requirements

Use cases

Limitations

Related tools