Glycosylation site finder

Find potential N-linked glycosylation sites (NX[S/T] sequons) in protein sequences.

Input

0/3 sequences

Output

Configure input settings, then click "Run"

What is N-linked glycosylation?

N-linked glycosylation is one of the most common and important post-translational modifications (PTMs) of proteins. It involves the attachment of oligosaccharide chains (glycans) to the nitrogen atom of asparagine (Asn, N) side chains at specific sequence motifs called sequons. This modification occurs co-translationally in the endoplasmic reticulum and plays critical roles in protein folding, stability, cell signaling, and immune recognition.

The consensus motif for N-linked glycosylation is N-X-S/T, where:

N is asparagine (the glycosylation site)
X is any amino acid except proline
S/T is serine or threonine

The exclusion of proline at the X position is due to its rigid cyclic structure, which prevents the asparagine from adopting the conformation required for glycan attachment by the oligosaccharyltransferase (OST) enzyme.

Biological significance

Protein folding: Glycans act as quality control signals in the calnexin/calreticulin chaperone cycle
Protein stability: N-glycans shield surface regions from proteolysis and increase thermal stability
Cell signaling: Glycosylation modulates receptor-ligand interactions and signal transduction
Immune evasion: Viral envelope glycoproteins use dense glycan shields to evade host antibodies
Biopharmaceuticals: Glycosylation patterns affect the efficacy, half-life, and immunogenicity of therapeutic proteins

How to use the glycosylation site finder

This tool scans protein sequences for potential N-linked glycosylation sites by identifying all N-X-S/T sequons where X is not proline. It accepts input in FASTA, plain text, or PDB format and supports batch processing of multiple sequences.

Inputs

Input	Description
`Protein Sequences`	One or more amino acid sequences in FASTA format, plain text, or PDB file. Files can be uploaded as `.fasta`, `.fa`, `.txt`, or `.pdb` formats.

Results

Output is a spreadsheet with glycosylation site data for each input sequence.

Column	Description
`Protein ID`	Sequence identifier extracted from the FASTA header.
`Amino Acids`	Total number of amino acid residues in the sequence.
`Total Sites`	Number of N-X-S/T sequons identified in the sequence.
`Density`	Percentage of residues that are part of glycosylation sites (sites / sequence length x 100).
`Sites`	List of identified sites showing position number and sequon triplet (e.g., `42:NVT, 158:NAS`).

How does the glycosylation site finder work?

The tool performs a linear scan of each protein sequence, examining every tripeptide window for the N-X-S/T consensus motif. The algorithm works as follows:

Input parsing: Sequences are extracted from FASTA, plain text, or PDB input. PDB files are first converted to FASTA by reading ATOM/HETATM records.
Motif scanning: For each position i in the sequence, the tool checks whether residue i is asparagine (N), residue i+1 is not proline (P), and residue i+2 is serine (S) or threonine (T).
Context extraction: For each identified site, a context window of up to 5 residues on each side of the sequon is captured to show the local sequence environment.
Density calculation: The number of sites is divided by the total sequence length and multiplied by 100 to give a density percentage.

This is a motif-based prediction, not a machine learning model. It identifies all positions in the sequence that match the consensus motif. Not all sequons are necessarily glycosylated in vivo -- actual glycosylation depends on additional factors including protein folding, accessibility to OST, and cellular context.

Interpreting the results

Site density

The density of glycosylation sites varies substantially across protein types:

< 0.5%: Low glycosylation potential, typical for intracellular proteins
0.5 -- 2.0%: Moderate glycosylation density, common for secreted proteins
> 2.0%: High glycosylation density, often seen in heavily glycosylated proteins like mucins or viral envelope glycoproteins

Context considerations

The presence of an N-X-S/T sequon is necessary but not sufficient for glycosylation. Additional factors that influence whether a sequon is actually glycosylated include:

Structural accessibility: Sequons buried in the protein core are unlikely to be glycosylated
Distance from termini: Sequons very close to the C-terminus (within ~60-70 residues of the stop codon) may not be glycosylated due to incomplete folding during translocation
N-X-T vs N-X-S preference: Threonine-containing sequons (N-X-T) are generally glycosylated more efficiently than serine-containing ones (N-X-S)
Flanking residues: Aromatic and charged residues near the sequon can influence glycosylation efficiency

Limitations

Prediction vs. experimental validation: This tool identifies potential sites based on the consensus motif only. Experimental techniques such as mass spectrometry or site-directed mutagenesis are needed to confirm actual glycosylation.
N-linked only: The tool does not detect O-linked glycosylation (on serine/threonine without the N-X-S/T motif), C-mannosylation, or other glycosylation types.
No structural context: The analysis is sequence-based and does not account for three-dimensional protein structure or surface accessibility.
No species/tissue specificity: Glycosylation patterns can vary between organisms and cell types. The tool does not model organism-specific glycosylation machinery.

Motif Scanner: Scan for multiple protein motifs simultaneously, including glycosylation sites, phosphorylation sites, and nuclear localization signals
Protein Parameters: Comprehensive physicochemical property analysis including molecular weight, pI, and amino acid composition
Amino Acid Composition: Analyze the residue distribution of protein sequences
Peptide Cutter: Predict protease cleavage sites in protein sequences

Glycosylation site finder

Input

Output

What is N-linked glycosylation?

Biological significance

How to use the glycosylation site finder

Inputs

Results

How does the glycosylation site finder work?

Interpreting the results

Site density

Context considerations

Limitations

Related tools

Input

Output

What is N-linked glycosylation?

Biological significance

How to use the glycosylation site finder

Inputs

Results

How does the glycosylation site finder work?

Interpreting the results

Site density

Context considerations

Limitations

Related tools