What is N-linked glycosylation?
N-linked glycosylation is one of the most common and important post-translational modifications (PTMs) of proteins. It involves the attachment of oligosaccharide chains (glycans) to the nitrogen atom of asparagine (Asn, N) side chains at specific sequence motifs called sequons. This modification occurs co-translationally in the endoplasmic reticulum and plays critical roles in protein folding, stability, cell signaling, and immune recognition.
The consensus motif for N-linked glycosylation is N-X-S/T, where:
- N is asparagine (the glycosylation site)
- X is any amino acid except proline
- S/T is serine or threonine
The exclusion of proline at the X position is due to its rigid cyclic structure, which prevents the asparagine from adopting the conformation required for glycan attachment by the oligosaccharyltransferase (OST) enzyme.
Biological significance
- Protein folding: Glycans act as quality control signals in the calnexin/calreticulin chaperone cycle
- Protein stability: N-glycans shield surface regions from proteolysis and increase thermal stability
- Cell signaling: Glycosylation modulates receptor-ligand interactions and signal transduction
- Immune evasion: Viral envelope glycoproteins use dense glycan shields to evade host antibodies
- Biopharmaceuticals: Glycosylation patterns affect the efficacy, half-life, and immunogenicity of therapeutic proteins
How to use the glycosylation site finder
This tool scans protein sequences for potential N-linked glycosylation sites by identifying all N-X-S/T sequons where X is not proline. It accepts input in FASTA, plain text, or PDB format and supports batch processing of multiple sequences.
Inputs
| Input | Description |
|---|
Protein Sequences | One or more amino acid sequences in FASTA format, plain text, or PDB file. Files can be uploaded as .fasta, .fa, .txt, or .pdb formats. |
Results
Output is a spreadsheet with glycosylation site data for each input sequence.
| Column | Description |
|---|
Protein ID | Sequence identifier extracted from the FASTA header. |
Amino Acids | Total number of amino acid residues in the sequence. |
Total Sites | Number of N-X-S/T sequons identified in the sequence. |
Density | Percentage of residues that are part of glycosylation sites (sites / sequence length x 100). |
Sites | List of identified sites showing position number and sequon triplet (e.g., 42:NVT, 158:NAS). |
How does the glycosylation site finder work?
The tool performs a linear scan of each protein sequence, examining every tripeptide window for the N-X-S/T consensus motif. The algorithm works as follows:
- Input parsing: Sequences are extracted from FASTA, plain text, or PDB input. PDB files are first converted to FASTA by reading ATOM/HETATM records.
- Motif scanning: For each position i in the sequence, the tool checks whether residue i is asparagine (N), residue i+1 is not proline (P), and residue i+2 is serine (S) or threonine (T).
- Context extraction: For each identified site, a context window of up to 5 residues on each side of the sequon is captured to show the local sequence environment.
- Density calculation: The number of sites is divided by the total sequence length and multiplied by 100 to give a density percentage.
This is a motif-based prediction, not a machine learning model. It identifies all positions in the sequence that match the consensus motif. Not all sequons are necessarily glycosylated in vivo -- actual glycosylation depends on additional factors including protein folding, accessibility to OST, and cellular context.
Interpreting the results
Site density
The density of glycosylation sites varies substantially across protein types:
- < 0.5%: Low glycosylation potential, typical for intracellular proteins
- 0.5 -- 2.0%: Moderate glycosylation density, common for secreted proteins
- > 2.0%: High glycosylation density, often seen in heavily glycosylated proteins like mucins or viral envelope glycoproteins
Context considerations
The presence of an N-X-S/T sequon is necessary but not sufficient for glycosylation. Additional factors that influence whether a sequon is actually glycosylated include:
- Structural accessibility: Sequons buried in the protein core are unlikely to be glycosylated
- Distance from termini: Sequons very close to the C-terminus (within ~60-70 residues of the stop codon) may not be glycosylated due to incomplete folding during translocation
- N-X-T vs N-X-S preference: Threonine-containing sequons (N-X-T) are generally glycosylated more efficiently than serine-containing ones (N-X-S)
- Flanking residues: Aromatic and charged residues near the sequon can influence glycosylation efficiency
Limitations
- Prediction vs. experimental validation: This tool identifies potential sites based on the consensus motif only. Experimental techniques such as mass spectrometry or site-directed mutagenesis are needed to confirm actual glycosylation.
- N-linked only: The tool does not detect O-linked glycosylation (on serine/threonine without the N-X-S/T motif), C-mannosylation, or other glycosylation types.
- No structural context: The analysis is sequence-based and does not account for three-dimensional protein structure or surface accessibility.
- No species/tissue specificity: Glycosylation patterns can vary between organisms and cell types. The tool does not model organism-specific glycosylation machinery.
- Motif Scanner: Scan for multiple protein motifs simultaneously, including glycosylation sites, phosphorylation sites, and nuclear localization signals
- Protein Parameters: Comprehensive physicochemical property analysis including molecular weight, pI, and amino acid composition
- Amino Acid Composition: Analyze the residue distribution of protein sequences
- Peptide Cutter: Predict protease cleavage sites in protein sequences