Guides / Statistics

How many proteins are there?

Dr. Matic Broz·April 23, 2026

How many proteins are there?

Key takeaways

  • Catalogued protein sequence clusters: 475,217,233 in UniRef100 (UniProt REST API release 2026_01, accessed April 23, 2026)
  • Human protein-coding genes: 19,433 in current GENCODE v49
  • Distinct translated human products: 129,801 in current GENCODE v49
  • Human Proteome Project reference proteome: 19,435 proteins, with 93.6% confidently detected (2025 HUPO HPP report, published 2026)
  • Protein molecules in a typical mammalian cell: about 10 billion (2023 review)

As of April 23, 2026, the cleanest single answer is that UniProt's UniRef100 database contains 475,217,233 protein sequence clusters. That number matters because it describes how much sequence space biology has actually catalogued. But there is no single universal protein count: in humans, you can also count 19,433 protein-coding genes, 129,801 distinct translated products, a 19,435-protein Human Proteome Project reference proteome with 93.6% confident detection, or about 10 billion protein molecules in one typical mammalian cell.

There is no single protein count

Protein counts range from 19,433 human protein-coding genes to 475,217,233 UniRef100 sequence clusters because different sources count different biological objects.

If you mean known sequences, the relevant number is a database count such as UniRef100. If you mean human gene products, the relevant numbers come from gene annotation sets such as GENCODE and HPP reference lists. If you mean physical molecules inside cells, the number is much larger, because one protein type can exist in thousands to billions of copies.

That is why articles about protein counts often seem to disagree while all being partly correct.

UniRef currently indexes 475.2 million protein sequence clusters

UniProt's UniRef API returned 475,217,233 clusters at 100% identity, 188,848,220 clusters at 90% identity, and 60,315,044 clusters at 50% identity when queried on April 23, 2026.[1]

These are not three different answers to the same question. They are three levels of redundancy reduction. UniRef100 merges exact sequence matches, UniRef90 groups closely related sequences, and UniRef50 compresses the database further into broader protein families.

UniRef protein sequences by similarity threshold
UniRef protein sequences by similarity threshold

UniRef datasetWhat it countsCount
UniRef100Exact-sequence clusters475,217,233
UniRef90Clusters at 90% sequence identity188,848,220
UniRef50Clusters at 50% sequence identity60,315,044

Source: UniProt REST API queries for UniRef100, UniRef90, and UniRef50, accessed April 23, 2026. Response headers reported UniProt release 2026_01.[1]

The human proteome is 19,433 genes, 129,801 translated products, and 93.6% confidently detected in HPP

GENCODE v49 lists 19,433 human protein-coding genes, 211,446 protein-coding transcripts, and 129,801 distinct translations.[2]

Those are annotation counts: they describe what the current reference gene set says the human genome can encode. The Human Proteome Project asks a different question: how much of the reference proteome has confident evidence of expression?

The 2025 HUPO Human Proteome Project report describes an HPP reference proteome of 19,435 proteins based on GENCODE v48, UniProtKB 2025_03, Human Protein Atlas 24, MassIVE-KB 2023, and PeptideAtlas 2025-01. It reports that 93.6% of that proteome has been detected.[3] Human Protein Atlas summarized the same 2025 report as 19,435 protein-coding genes with 94% confident PE1 detection.[4]

Human proteome levelCountWhat it means
Protein-coding genes19,433Genes annotated as protein-coding in GENCODE v49
Protein-coding transcripts211,446Transcript isoforms annotated as protein-coding
Distinct translations129,801Distinct translated protein products in GENCODE v49
HPP reference proteome19,435 proteins2025 HUPO HPP target list based on GENCODE v48 plus integrated protein resources
Confidently detected HPP proteins93.6%Share of the 2025 HPP reference proteome detected with confident expression evidence

This separation matters. A gene count is not a protein count, an annotated translation count is not the same thing as a protein that has been directly observed in experiments, and the current GENCODE v49 gene count does not have to match the 2025 HPP target list exactly because the HPP report used GENCODE v48 plus additional protein resources.

A typical mammalian cell contains about 10 billion protein molecules

A 2023 review on protein counting and single-molecule proteomics says a typical mammalian cell of roughly 3,000 um3 contains about 10,000,000,000 protein molecules, with a typical density of about 3 million protein molecules per cubic micrometer.[5]

That is a molecule count, not a count of unique protein types. Deep mass-spectrometry studies can identify around 10,411 protein groups in a 30-minute human cell-line proteomics run, which shows the gap between counting protein copies and counting protein species.[6]

For a practical mental model:

Cell-level measureTypical valueSource
Total protein molecules in a mammalian cell~10 billion2023 review
Protein density~3 million molecules per um^32023 review
Protein groups identified in a fast deep human proteome run10,4112024 proteomics study

Protein sequence space is far larger than biology has sampled

For a protein just 100 amino acids long, the number of possible sequences is 20100 because each position can hold one of 20 standard amino acids.

That theoretical space is so large that the 475.2 million catalogued UniRef100 clusters represent only a tiny, biologically explored corner of what chemistry allows. This is one reason protein engineering and protein design still have so much open search space.

Methodology

This article uses four different counting frames, and they should not be merged into a single headline number.

  1. Known sequence clusters come from live UniProt UniRef REST API queries run on April 23, 2026. The counts are read from the X-Total-Results response header.
  2. Human annotated genes, transcripts, and translations come from the current GENCODE human statistics page, which reports release v49.[2]
  3. Human protein detection status comes from the 2025 HUPO Human Proteome Project report. Its HPP reference proteome count is not substituted for the current GENCODE v49 gene count because it was built from GENCODE v48 plus integrated proteomics resources.[3]
  4. Protein molecules per cell refer to molecules, not unique protein types, and come from a review that summarizes cell volume and molecule-density estimates for a typical mammalian cell.[5]

The theoretical 20100 sequence-space figure is a simple combinatoric calculation, not a database count.

Sources
  1. UniRef REST API counts for UniRef100, UniRef90, and UniRef50 UniProt · April 23, 2026. https://rest.uniprot.org/uniref/search?size=1&query=identity%3A1.0
  2. Human release statistics (v49) GENCODE. https://www.gencodegenes.org/human/stats.html
  3. The 2025 Report on the Human Proteome from the HUPO Human Proteome Project Journal of Proteome Research · 2026. https://pubs.acs.org/doi/full/10.1021/acs.jproteome.5c00759
  4. The 2025 HUPO HPP report on the human proteome Human Protein Atlas · 2026. https://www.proteinatlas.org/news/2026-02-20/the-2025-hupo-hpp-report-on-the-human-proteome
  5. Sampling the proteome by emerging single-molecule and mass spectrometry methods Nature Methods · 2023. https://www.nature.com/articles/s41592-023-01802-5
  6. The One Hour Human Proteome PubMed · 2024. https://pubmed.ncbi.nlm.nih.gov/38579929/
Matic Broz

Matic Broz

Founder & CEO, ProteinIQ

Matic founded ProteinIQ to make computational biology accessible to every researcher. He builds code-free bioinformatics tools used by thousands of scientists worldwide for protein analysis, molecular docking, and drug discovery.