Key takeaways
- Most common amino acid in the UniProtKB/Swiss-Prot composition scale: leucine at 9.66% (Expasy ProtScale, Swiss-Prot release 2013_04 scale)
- Least common standard amino acid in the same scale: tryptophan at 1.08% (Expasy ProtScale)
- Current Swiss-Prot context: release 2026_01 contains 574,627 reviewed sequence entries and 208,482,574 amino acids (UniProtKB/Swiss-Prot release statistics, January 2026)
- Collagen is a special case: glycine appears at every third amino acid in the collagen Gly-X-Y repeat, so it accounts for about one-third of collagen residues (NCBI Bookshelf, updated 2023)
- Across 5,590 proteomes, a 2024 Scientific Reports study found that only a few amino acids consistently occupy the most-used and least-used ranks across domains of life
Leucine is the most common amino acid in the UniProtKB/Swiss-Prot amino acid composition scale, where it accounts for 9.66% of residues. That answer is useful because Swiss-Prot is a manually curated reference set, but it should be read as a database composition statistic, not a universal constant for every protein, organism, tissue, or protein family.
Leucine is the most common amino acid in Swiss-Prot
Leucine accounts for 9.66% of amino acid residues in the Expasy ProtScale composition scale derived from UniProtKB/Swiss-Prot release 2013_04.[1]Amino acid composition (%) in the UniProtKB/Swiss-Prot data bankExpasy ProtScaleView source
The same scale puts alanine second at 8.25%, glycine third at 7.07%, and tryptophan last at 1.08%. The ranking is a broad proteome-level reference: individual proteins can differ sharply depending on structure, organism, localization, and function.
| Rank | Amino acid | Three-letter code | Frequency in Swiss-Prot scale |
|---|---|---|---|
| 1 | Leucine | Leu | 9.66% |
| 2 | Alanine | Ala | 8.25% |
| 3 | Glycine | Gly | 7.07% |
| 4 | Valine | Val | 6.87% |
| 5 | Glutamic acid | Glu | 6.75% |
| 6 | Serine | Ser | 6.56% |
| 7 | Isoleucine | Ile | 5.96% |
| 8 | Lysine | Lys | 5.84% |
| 9 | Arginine | Arg | 5.53% |
| 10 | Aspartic acid | Asp | 5.45% |
| 11 | Threonine | Thr | 5.34% |
| 12 | Proline | Pro | 4.70% |
| 13 | Asparagine | Asn | 4.06% |
| 14 | Glutamine | Gln | 3.93% |
| 15 | Phenylalanine | Phe | 3.86% |
| 16 | Tyrosine | Tyr | 2.92% |
| 17 | Methionine | Met | 2.42% |
| 18 | Histidine | His | 2.27% |
| 19 | Cysteine | Cys | 1.37% |
| 20 | Tryptophan | Trp | 1.08% |
Source: Expasy ProtScale, amino acid composition (%) in UniProtKB/Swiss-Prot release 2013_04.[1]Amino acid composition (%) in the UniProtKB/Swiss-Prot data bankExpasy ProtScaleView source
Current Swiss-Prot has 574,627 reviewed sequence entries
UniProtKB/Swiss-Prot release 2026_01 contains 574,627 reviewed sequence entries comprising 208,482,574 amino acids.[2]Release 2026_01 statisticsUniProtKB/Swiss-Prot · 2026View source
That release statistic is not the source of the 9.66% leucine scale itself, which Expasy labels as a Swiss-Prot release 2013_04 scale. It provides the current database context: Swiss-Prot remains a curated protein knowledgebase, and its composition scale is best interpreted as a reference distribution from a curated protein set rather than a live recalculation for every release.
| Swiss-Prot release statistic | Value |
|---|---|
| Release | 2026_01 |
| Release date | January 28, 2026 |
| Reviewed sequence entries | 574,627 |
| Amino acids in entries | 208,482,574 |
| Average sequence length | 362 amino acids |
| Species represented | 14,846 |
Source: UniProtKB/Swiss-Prot release 2026_01 statistics.[2]Release 2026_01 statisticsUniProtKB/Swiss-Prot · 2026View source
Glycine is the most common amino acid in collagen
Glycine accounts for about one-third of collagen residues because collagen follows a repeating Gly-X-Y sequence pattern.
NCBI Bookshelf's StatPearls chapter on collagen synthesis states that collagen's primary amino acid sequence is glycine-proline-X or glycine-X-hydroxyproline, and that every third amino acid is glycine. The one-third figure is therefore a direct consequence of the reported repeat pattern: one glycine in every three residues is about 33.3% glycine.[3]Biochemistry, Collagen SynthesisStatPearls, NCBI Bookshelf · Updated September 4, 2023View source
| Context | Most common residue | Reported or derived basis |
|---|---|---|
| Broad Swiss-Prot composition scale | Leucine, 9.66% | Reported by Expasy ProtScale |
| Collagen triple-helix repeat | Glycine, about 33.3% | Derived from the reported Gly-X-Y repeat |
| Swiss-Prot least common standard amino acid | Tryptophan, 1.08% | Reported by Expasy ProtScale |
This is why the answer changes if the question is narrowed from "proteins overall" to "collagen." Collagen is not compositionally average; its triple helix depends on glycine being small enough to fit into the crowded center of the structure.
Tryptophan is the least common standard amino acid in Swiss-Prot
Tryptophan is the least common standard amino acid in the Swiss-Prot composition scale at 1.08%.[1]Amino acid composition (%) in the UniProtKB/Swiss-Prot data bankExpasy ProtScaleView source
A 2020 review in the International Journal of Molecular Sciences makes the same biological point from a different angle: tryptophan is one of the rarest amino acids in proteomes, is encoded by the single codon UGG, and is described as the most energetically expensive amino acid to synthesize. The review reports an energy cost equivalent to 74 high-energy phosphate bonds for tryptophan biosynthesis, compared with 52 for phenylalanine and 50 for tyrosine.[4]The Uniqueness of Tryptophan in Biology: Properties, Metabolism, Interactions and Localization in ProteinsInternational Journal of Molecular Sciences · 2020View source
Those numbers help explain why tryptophan is rare, but they do not replace the database frequency. The 1.08% figure comes from the Swiss-Prot composition scale; the biosynthetic-cost and single-codon facts come from the 2020 review.[1]Amino acid composition (%) in the UniProtKB/Swiss-Prot data bankExpasy ProtScaleView source[4]The Uniqueness of Tryptophan in Biology: Properties, Metabolism, Interactions and Localization in ProteinsInternational Journal of Molecular Sciences · 2020View source
Amino acid rankings vary, but the extremes are constrained
A 2024 Scientific Reports study analyzed 5,590 proteomes across bacteria, archaea, eukaryotes, and viruses and found that only a few amino acids consistently occupy the most-used and least-used ranks.[5]Differential amino acid usage leads to ubiquitous edge effect in proteomes across domains of life that can be explained by amino acid secondary structure propensitiesScientific Reports · 2024View source
The study called this an amino acid usage "edge effect": diversity is lower at the high-frequency and low-frequency ends than in the middle of the ranking. In practical terms, leucine often appears near the high-frequency end, while tryptophan and cysteine often appear near the low-frequency end, but the exact rank order can still vary by domain, organism, protein class, and environmental pressure.
| Study dataset | Value |
|---|---|
| Proteomes analyzed | 5,590 |
| Archaea species | 328 |
| Bacteria species | 4,107 |
| Eukaryote species | 1,118 |
| Virus proteomes | 37 |
| PDB entries used for secondary-structure analysis | 40,885 |
Source: Morimoto and Pietras, Scientific Reports, 2024.[5]Differential amino acid usage leads to ubiquitous edge effect in proteomes across domains of life that can be explained by amino acid secondary structure propensitiesScientific Reports · 2024View source
Why leucine is common
Leucine has six synonymous codons, tied with serine and arginine for the largest codon set among the 20 standard amino acids.[5]Differential amino acid usage leads to ubiquitous edge effect in proteomes across domains of life that can be explained by amino acid secondary structure propensitiesScientific Reports · 2024View source
That codon count is one reason leucine is common, but it is not the whole explanation. The 2024 cross-proteome study found a positive correlation between redundant codon number and amino acid frequency, while also arguing that amino acid usage patterns are shaped by protein secondary-structure constraints. In other words, codon degeneracy helps set expectations, but folded proteins still select residues for structural and functional reasons.[5]Differential amino acid usage leads to ubiquitous edge effect in proteomes across domains of life that can be explained by amino acid secondary structure propensitiesScientific Reports · 2024View source
Leucine is also a hydrophobic residue, so it is frequently useful in buried protein cores and other nonpolar environments. That structural role is consistent with its high database frequency, but the safest quote-ready number remains the Swiss-Prot composition value: 9.66%.[1]Amino acid composition (%) in the UniProtKB/Swiss-Prot data bankExpasy ProtScaleView source
Methodology
This article separates reported database values from derived or interpretive statements.
- The amino acid frequency ranking uses the Expasy ProtScale amino acid composition scale for UniProtKB/Swiss-Prot. Those values are reported by Expasy and are not recalculated here.[1]Amino acid composition (%) in the UniProtKB/Swiss-Prot data bankExpasy ProtScaleView source
- Current Swiss-Prot database size uses UniProtKB/Swiss-Prot release 2026_01 statistics, which report 574,627 sequence entries and 208,482,574 amino acids as of January 28, 2026.[2]Release 2026_01 statisticsUniProtKB/Swiss-Prot · 2026View source
- The collagen glycine figure is derived from NCBI Bookshelf's reported collagen repeat pattern. If every third residue is glycine, glycine accounts for 1/3 of the sequence, or about 33.3%.[3]Biochemistry, Collagen SynthesisStatPearls, NCBI Bookshelf · Updated September 4, 2023View source
- Tryptophan scarcity uses two different sources: the 1.08% frequency comes from Expasy ProtScale, while the single-codon and biosynthetic-cost explanations come from Barik 2020.[1]Amino acid composition (%) in the UniProtKB/Swiss-Prot data bankExpasy ProtScaleView source[4]The Uniqueness of Tryptophan in Biology: Properties, Metabolism, Interactions and Localization in ProteinsInternational Journal of Molecular Sciences · 2020View source
- Cross-species variation uses Morimoto and Pietras 2024, which analyzed 5,590 proteomes and 40,885 PDB entries for secondary-structure comparisons.[5]Differential amino acid usage leads to ubiquitous edge effect in proteomes across domains of life that can be explained by amino acid secondary structure propensitiesScientific Reports · 2024View source
Sources▼
- Amino acid composition (%) in the UniProtKB/Swiss-Prot data bank Expasy ProtScale. https://web.expasy.org/protscale/pscale/A.A.Swiss-Prot.html
- Release 2026_01 statistics UniProtKB/Swiss-Prot · 2026. https://web.expasy.org/docs/relnotes/relstat.html
- Biochemistry, Collagen Synthesis StatPearls, NCBI Bookshelf · Updated September 4, 2023. https://www.ncbi.nlm.nih.gov/books/NBK507709/
- The Uniqueness of Tryptophan in Biology: Properties, Metabolism, Interactions and Localization in Proteins International Journal of Molecular Sciences · 2020. https://pmc.ncbi.nlm.nih.gov/articles/PMC7699789/
- Differential amino acid usage leads to ubiquitous edge effect in proteomes across domains of life that can be explained by amino acid secondary structure propensities Scientific Reports · 2024. https://www.nature.com/articles/s41598-024-77319-4





