- Statistics
AlphaFold Statistics [2025 data]
The AlphaFold database contains over 214 million predicted protein structures, covering nearly all known cataloged proteins.
![AlphaFold Statistics [2025 data]](/_next/image?url=%2Fimages%2Fblog%2Fguides%2Falphafold-statistics%2Falphafold-statistics-featured.jpg&w=3840&q=75&dpl=dpl_5dmBbqCHkGdWcRGdrUEy5tpRhW8t)
- Total structures: >214 million (AlphaFold DB, 2024)
- Training data: ~170,000 PDB structures (Nature, 2021)
- Impact: >43,000 citations (Google Scholar, 2025)
How many structures are in the AlphaFold Database?
The AlphaFold Protein Structure Database (AlphaFold DB) currently contains over 214 million predicted protein structures. This massive dataset covers nearly all protein sequences cataloged in the UniProt database, expanding significantly from the initial release of roughly 300,000 structures in 2021.
This near-complete coverage of the known protein universe allows researchers to instantly access structural models for organisms ranging from humans and mice to bacteria and malaria parasites. While AlphaFold 2 predictions constitute the bulk of this database, new updates continue to refine the structural models available to the scientific community.
How accurate is AlphaFold?
AlphaFold 3, released in 2024, achieves a 50% improvement in prediction accuracy for protein interactions with other molecule types compared to existing methods. For protein-protein interactions specifically, AlphaFold 3 predictions achieve high confidence, with 71.6% of modeled complexes scoring an interface predicted Template Modeling (ipTM) score of at least 0.8.
Accuracy is typically measured using the pLDDT score (predicted Local Distance Difference Test), a per-residue confidence metric ranging from 0 to 100:
| pLDDT Score | Confidence Level | Interpretation |
|---|---|---|
| 90–100 | Very High | Modeled with atomic accuracy; suitable for side-chain analysis |
| 70–90 | High | Correct backbone prediction; good for general structural characterization |
| 50–70 | Low | Low confidence; caution required for structural interpretation |
| < 50 | Very Low | Likely disordered region; structure may be unstructured in isolation |
In our Protein Viewer tool, we visualize these pLDDT scores directly on the structure, coloring regions blue (high confidence) to orange/red (low confidence) to help researchers assess usability.
How much data was AlphaFold trained on?
AlphaFold 2 was trained on approximately 170,000 protein structures from the Protein Data Bank (PDB). The training dataset included protein chains and clusters released before April 30, 2018, ensuring that the system could be fairly tested against structures released after that date (CASP14 targets).
Despite training on a dataset of only ~170,000 experimentally determined structures, the model has successfully generalized to predict the structures of over 200 million proteins, effectively "filling in" the structural map of the protein universe.
What is the impact of AlphaFold?
The original AlphaFold paper ("Highly accurate protein structure prediction with AlphaFold") has garnered over 43,000 citations as of late 2025, making it one of the most cited scientific papers of the decade. The AlphaFold system is now used by millions of researchers worldwide to accelerate drug discovery, understand disease mechanisms, and engineer new enzymes.
Sources
- Jumper, J., et al. "Highly accurate protein structure prediction with AlphaFold." Nature, 2021. https://doi.org/10.1038/s41586-021-03819-2
- Abramson, J., et al. "Accurate structure prediction of biomolecular interactions with AlphaFold 3." Nature, 2024. https://doi.org/10.1038/s41586-024-07487-w
- Varadi, M., et al. "AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models." Nucleic Acids Research, 2022. https://doi.org/10.1093/nar/gkab1061
- Google DeepMind. "AlphaFold Protein Structure Database." Accessed December 2025. https://alphafold.ebi.ac.uk/