AlphaFold Statistics [2025 data]

The AlphaFold database contains over 214 million predicted protein structures, covering nearly all known cataloged proteins.

Dr. Matic Broz
Dr. Matic Broz
Computational chemist
AlphaFold Statistics [2025 data]
Quick Answer
  • Total structures: >214 million (AlphaFold DB, 2024)
  • Training data: ~170,000 PDB structures (Nature, 2021)
  • Impact: >43,000 citations (Google Scholar, 2025)

How many structures are in the AlphaFold Database?

The AlphaFold Protein Structure Database (AlphaFold DB) currently contains over 214 million predicted protein structures. This massive dataset covers nearly all protein sequences cataloged in the UniProt database, expanding significantly from the initial release of roughly 300,000 structures in 2021.

This near-complete coverage of the known protein universe allows researchers to instantly access structural models for organisms ranging from humans and mice to bacteria and malaria parasites. While AlphaFold 2 predictions constitute the bulk of this database, new updates continue to refine the structural models available to the scientific community.

How accurate is AlphaFold?

AlphaFold 3, released in 2024, achieves a 50% improvement in prediction accuracy for protein interactions with other molecule types compared to existing methods. For protein-protein interactions specifically, AlphaFold 3 predictions achieve high confidence, with 71.6% of modeled complexes scoring an interface predicted Template Modeling (ipTM) score of at least 0.8.

Accuracy is typically measured using the pLDDT score (predicted Local Distance Difference Test), a per-residue confidence metric ranging from 0 to 100:

pLDDT ScoreConfidence LevelInterpretation
90–100Very HighModeled with atomic accuracy; suitable for side-chain analysis
70–90HighCorrect backbone prediction; good for general structural characterization
50–70LowLow confidence; caution required for structural interpretation
< 50Very LowLikely disordered region; structure may be unstructured in isolation

In our Protein Viewer tool, we visualize these pLDDT scores directly on the structure, coloring regions blue (high confidence) to orange/red (low confidence) to help researchers assess usability.

How much data was AlphaFold trained on?

AlphaFold 2 was trained on approximately 170,000 protein structures from the Protein Data Bank (PDB). The training dataset included protein chains and clusters released before April 30, 2018, ensuring that the system could be fairly tested against structures released after that date (CASP14 targets).

Despite training on a dataset of only ~170,000 experimentally determined structures, the model has successfully generalized to predict the structures of over 200 million proteins, effectively "filling in" the structural map of the protein universe.

What is the impact of AlphaFold?

The original AlphaFold paper ("Highly accurate protein structure prediction with AlphaFold") has garnered over 43,000 citations as of late 2025, making it one of the most cited scientific papers of the decade. The AlphaFold system is now used by millions of researchers worldwide to accelerate drug discovery, understand disease mechanisms, and engineer new enzymes.

Sources

<script type="application/ld+json"> { "@context": "https://schema.org", "@graph": [ { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "How many structures are in the AlphaFold Database?", "acceptedAnswer": { "@type": "Answer", "text": "The AlphaFold Protein Structure Database (AlphaFold DB) currently contains over 214 million predicted protein structures, covering nearly all protein sequences cataloged in UniProt." } }, { "@type": "Question", "name": "How accurate is AlphaFold?", "acceptedAnswer": { "@type": "Answer", "text": "AlphaFold 3 achieves a 50% improvement in prediction accuracy for protein interactions compared to existing methods, and models over 99% of molecular types in the PDB with high accuracy." } }, { "@type": "Question", "name": "How much data was AlphaFold trained on?", "acceptedAnswer": { "@type": "Answer", "text": "AlphaFold 2 was trained on approximately 170,000 protein structures from the Protein Data Bank (PDB), specifically those released before April 2018." } } ] }, { "@type": "Article", "headline": "AlphaFold Statistics [2025 data]", "description": "The AlphaFold database contains over 214 million predicted protein structures, covering nearly all known cataloged proteins.", "image": "/images/blog/guides/alphafold-statistics/alphafold-statistics-featured.jpg", "datePublished": "2025-12-27", "dateModified": "2025-12-27", "author": { "@type": "Person", "name": "Dr. Matic Broz", "url": "https://proteiniq.io/about", "jobTitle": "Founder", "affiliation": { "@type": "Organization", "name": "ProteinIQ" }, "sameAs": [ "https://www.linkedin.com/in/maticbroz/", "https://scholar.google.com/citations?user=XXXXXX" ] }, "publisher": { "@type": "Organization", "name": "ProteinIQ", "url": "https://proteiniq.io", "logo": { "@type": "ImageObject", "url": "https://proteiniq.io/logo.png" } }, "citation": [ { "@type": "ScholarlyArticle", "name": "Highly accurate protein structure prediction with AlphaFold", "url": "https://doi.org/10.1038/s41586-021-03819-2" }, { "@type": "ScholarlyArticle", "name": "Accurate structure prediction of biomolecular interactions with AlphaFold 3", "url": "https://doi.org/10.1038/s41586-024-07487-w" } ] }, { "@type": "Dataset", "name": "AlphaFold pLDDT Score Interpretation", "description": "Confidence levels and interpretation for AlphaFold pLDDT scores.", "creator": { "@type": "Organization", "name": "ProteinIQ" }, "distribution": { "@type": "DataDownload", "encodingFormat": "text/html", "contentUrl": "https://proteiniq.io/content/guides/alphafold-statistics" } } ] } </script>