Guides / Statistics

What's the most common amino acid?

Dr. Matic Broz·April 28, 2026

What's the most common amino acid?

Key takeaways

  • Most common amino acid in the UniProtKB/Swiss-Prot composition scale: leucine at 9.66% (Expasy ProtScale, Swiss-Prot release 2013_04 scale)
  • Least common standard amino acid in the same scale: tryptophan at 1.08% (Expasy ProtScale)
  • Current Swiss-Prot context: release 2026_01 contains 574,627 reviewed sequence entries and 208,482,574 amino acids (UniProtKB/Swiss-Prot release statistics, January 2026)
  • Collagen is a special case: glycine appears at every third amino acid in the collagen Gly-X-Y repeat, so it accounts for about one-third of collagen residues (NCBI Bookshelf, updated 2023)
  • Across 5,590 proteomes, a 2024 Scientific Reports study found that only a few amino acids consistently occupy the most-used and least-used ranks across domains of life

Leucine is the most common amino acid in the UniProtKB/Swiss-Prot amino acid composition scale, where it accounts for 9.66% of residues. That answer is useful because Swiss-Prot is a manually curated reference set, but it should be read as a database composition statistic, not a universal constant for every protein, organism, tissue, or protein family.

Leucine is the most common amino acid in Swiss-Prot

Leucine accounts for 9.66% of amino acid residues in the Expasy ProtScale composition scale derived from UniProtKB/Swiss-Prot release 2013_04.[1]

The same scale puts alanine second at 8.25%, glycine third at 7.07%, and tryptophan last at 1.08%. The ranking is a broad proteome-level reference: individual proteins can differ sharply depending on structure, organism, localization, and function.

RankAmino acidThree-letter codeFrequency in Swiss-Prot scale
1LeucineLeu9.66%
2AlanineAla8.25%
3GlycineGly7.07%
4ValineVal6.87%
5Glutamic acidGlu6.75%
6SerineSer6.56%
7IsoleucineIle5.96%
8LysineLys5.84%
9ArginineArg5.53%
10Aspartic acidAsp5.45%
11ThreonineThr5.34%
12ProlinePro4.70%
13AsparagineAsn4.06%
14GlutamineGln3.93%
15PhenylalaninePhe3.86%
16TyrosineTyr2.92%
17MethionineMet2.42%
18HistidineHis2.27%
19CysteineCys1.37%
20TryptophanTrp1.08%

Source: Expasy ProtScale, amino acid composition (%) in UniProtKB/Swiss-Prot release 2013_04.[1]

Current Swiss-Prot has 574,627 reviewed sequence entries

UniProtKB/Swiss-Prot release 2026_01 contains 574,627 reviewed sequence entries comprising 208,482,574 amino acids.[2]

That release statistic is not the source of the 9.66% leucine scale itself, which Expasy labels as a Swiss-Prot release 2013_04 scale. It provides the current database context: Swiss-Prot remains a curated protein knowledgebase, and its composition scale is best interpreted as a reference distribution from a curated protein set rather than a live recalculation for every release.

Swiss-Prot release statisticValue
Release2026_01
Release dateJanuary 28, 2026
Reviewed sequence entries574,627
Amino acids in entries208,482,574
Average sequence length362 amino acids
Species represented14,846

Source: UniProtKB/Swiss-Prot release 2026_01 statistics.[2]

Glycine is the most common amino acid in collagen

Glycine accounts for about one-third of collagen residues because collagen follows a repeating Gly-X-Y sequence pattern.

NCBI Bookshelf's StatPearls chapter on collagen synthesis states that collagen's primary amino acid sequence is glycine-proline-X or glycine-X-hydroxyproline, and that every third amino acid is glycine. The one-third figure is therefore a direct consequence of the reported repeat pattern: one glycine in every three residues is about 33.3% glycine.[3]

ContextMost common residueReported or derived basis
Broad Swiss-Prot composition scaleLeucine, 9.66%Reported by Expasy ProtScale
Collagen triple-helix repeatGlycine, about 33.3%Derived from the reported Gly-X-Y repeat
Swiss-Prot least common standard amino acidTryptophan, 1.08%Reported by Expasy ProtScale

This is why the answer changes if the question is narrowed from "proteins overall" to "collagen." Collagen is not compositionally average; its triple helix depends on glycine being small enough to fit into the crowded center of the structure.

Tryptophan is the least common standard amino acid in Swiss-Prot

Tryptophan is the least common standard amino acid in the Swiss-Prot composition scale at 1.08%.[1]

A 2020 review in the International Journal of Molecular Sciences makes the same biological point from a different angle: tryptophan is one of the rarest amino acids in proteomes, is encoded by the single codon UGG, and is described as the most energetically expensive amino acid to synthesize. The review reports an energy cost equivalent to 74 high-energy phosphate bonds for tryptophan biosynthesis, compared with 52 for phenylalanine and 50 for tyrosine.[4]

Those numbers help explain why tryptophan is rare, but they do not replace the database frequency. The 1.08% figure comes from the Swiss-Prot composition scale; the biosynthetic-cost and single-codon facts come from the 2020 review.[1][4]

Amino acid rankings vary, but the extremes are constrained

A 2024 Scientific Reports study analyzed 5,590 proteomes across bacteria, archaea, eukaryotes, and viruses and found that only a few amino acids consistently occupy the most-used and least-used ranks.[5]

The study called this an amino acid usage "edge effect": diversity is lower at the high-frequency and low-frequency ends than in the middle of the ranking. In practical terms, leucine often appears near the high-frequency end, while tryptophan and cysteine often appear near the low-frequency end, but the exact rank order can still vary by domain, organism, protein class, and environmental pressure.

Study datasetValue
Proteomes analyzed5,590
Archaea species328
Bacteria species4,107
Eukaryote species1,118
Virus proteomes37
PDB entries used for secondary-structure analysis40,885

Source: Morimoto and Pietras, Scientific Reports, 2024.[5]

Why leucine is common

Leucine has six synonymous codons, tied with serine and arginine for the largest codon set among the 20 standard amino acids.[5]

That codon count is one reason leucine is common, but it is not the whole explanation. The 2024 cross-proteome study found a positive correlation between redundant codon number and amino acid frequency, while also arguing that amino acid usage patterns are shaped by protein secondary-structure constraints. In other words, codon degeneracy helps set expectations, but folded proteins still select residues for structural and functional reasons.[5]

Leucine is also a hydrophobic residue, so it is frequently useful in buried protein cores and other nonpolar environments. That structural role is consistent with its high database frequency, but the safest quote-ready number remains the Swiss-Prot composition value: 9.66%.[1]

Methodology

This article separates reported database values from derived or interpretive statements.

  1. The amino acid frequency ranking uses the Expasy ProtScale amino acid composition scale for UniProtKB/Swiss-Prot. Those values are reported by Expasy and are not recalculated here.[1]
  2. Current Swiss-Prot database size uses UniProtKB/Swiss-Prot release 2026_01 statistics, which report 574,627 sequence entries and 208,482,574 amino acids as of January 28, 2026.[2]
  3. The collagen glycine figure is derived from NCBI Bookshelf's reported collagen repeat pattern. If every third residue is glycine, glycine accounts for 1/3 of the sequence, or about 33.3%.[3]
  4. Tryptophan scarcity uses two different sources: the 1.08% frequency comes from Expasy ProtScale, while the single-codon and biosynthetic-cost explanations come from Barik 2020.[1][4]
  5. Cross-species variation uses Morimoto and Pietras 2024, which analyzed 5,590 proteomes and 40,885 PDB entries for secondary-structure comparisons.[5]
Sources
  1. Amino acid composition (%) in the UniProtKB/Swiss-Prot data bank Expasy ProtScale. https://web.expasy.org/protscale/pscale/A.A.Swiss-Prot.html
  2. Release 2026_01 statistics UniProtKB/Swiss-Prot · 2026. https://web.expasy.org/docs/relnotes/relstat.html
  3. Biochemistry, Collagen Synthesis StatPearls, NCBI Bookshelf · Updated September 4, 2023. https://www.ncbi.nlm.nih.gov/books/NBK507709/
  4. The Uniqueness of Tryptophan in Biology: Properties, Metabolism, Interactions and Localization in Proteins International Journal of Molecular Sciences · 2020. https://pmc.ncbi.nlm.nih.gov/articles/PMC7699789/
  5. Differential amino acid usage leads to ubiquitous edge effect in proteomes across domains of life that can be explained by amino acid secondary structure propensities Scientific Reports · 2024. https://www.nature.com/articles/s41598-024-77319-4
Matic Broz

Matic Broz

Founder & CEO, ProteinIQ

Matic founded ProteinIQ to make computational biology accessible to every researcher. He builds code-free bioinformatics tools used by thousands of scientists worldwide for protein analysis, molecular docking, and drug discovery.