Protenix

Open-source AlphaFold 3 implementation for biomolecular structure prediction

What is Protenix?

Protenix is an open-source, trainable PyTorch reproduction of AlphaFold 3 developed by ByteDance. Unlike the original AlphaFold 3 which has restricted access, Protenix is fully open-source under the Apache 2.0 license, making advanced biomolecular structure prediction accessible to everyone.

Protenix predicts 3D structures of biomolecular complexes containing proteins, DNA, RNA, small molecule ligands, and ions—all in a single prediction. It achieves comparable accuracy to AlphaFold 3 across protein-ligand, protein-protein, and protein-nucleic acid benchmarks.

For simpler protein-only structure prediction, you can use ESMFold which runs faster on single chains. For an alternative approach to multi-entity complex prediction, see Boltz-2, Chai-1, or OpenFold 3.

How does Protenix work?

Protenix follows the AlphaFold 3 architecture, which uses a diffusion-based approach instead of the iterative refinement used in AlphaFold 2.

Diffusion-based structure generation

The model works by starting with random atom coordinates and progressively denoising them into a coherent structure. This diffusion process runs through multiple cycles, with each cycle refining the predicted coordinates. More cycles and steps generally produce higher-quality structures at the cost of longer runtime.

Multiple Sequence Alignment (MSA)

MSA searches for evolutionarily related sequences to your input protein. Residues that are conserved across species often indicate structural or functional importance, and co-evolving residue pairs reveal spatial contacts. We recommend keeping MSA enabled for best accuracy—it adds a few minutes to the prediction but significantly improves results.

Entity representation

Protenix treats each molecule as an "entity" with a chain ID. When you add proteins, ligands, DNA, or RNA, each gets assigned a chain identifier (A, B, C, etc.) that you use when defining constraints or interpreting output structures.

Input options

Proteins

Provide protein sequences in FASTA format, upload PDB/CIF files, or fetch directly from RCSB using a 4-letter PDB ID. You can add up to 10 protein chains per prediction.

Ligands

Enter small molecules as SMILES strings, CCD codes (e.g., ATP), or concatenated CCD codes for oligosaccharides (e.g., NAG_BMA_BGC). You can also upload SDF or MOL files. We support fetching ligand structures from PubChem by compound ID.

DNA and RNA

Enter nucleic acid sequences in FASTA format. Use standard nucleotide codes: A, T, G, C for DNA and A, U, G, C for RNA.

Ions

Enter metal ions or cofactors using their CCD code (e.g., ZN, MG, CA, FE). These are placed automatically based on the predicted binding sites.

Model variants

Protenix offers two model variants to balance accuracy and speed.

The Base model provides the highest accuracy and is recommended for production predictions. The Mini model is a lightweight variant with reduced network blocks that runs faster with only 1-5% drop in accuracy. Use Mini for rapid screening or when predicting many structures.

Constraints

Constraints allow you to guide the structure prediction toward specific configurations, useful when you have experimental or prior knowledge about the system.

Pocket constraints

Define where a ligand should bind. Specify which residues form the binding pocket and the maximum distance from the ligand. Format: binder_chain|contact_residues|max_distance. Example: B|A:45,A:46|6.0 places ligand B within 6Å of residues 45-46 on chain A.

Contact constraints

Force specific atoms or residues to be near each other. This is useful for known protein-protein interfaces or when docking data suggests specific contacts. Format: entity:copy:position:atom,entity:copy:position:atom|max_distance|min_distance.

Covalent bonds

Define covalent connections between molecules, essential for covalent inhibitors or cross-linked peptides. Format: entity:copy:position:atom,entity:copy:position:atom. Example: 1:1:12:SG,2:1:1:C1 connects the SG atom of residue 12 in entity 1 to the C1 atom of entity 2.

Modifications

Protein modifications (PTMs)

Specify post-translational modifications using CCD codes. Format: chain:position:CCD_code. Common modifications include:

SEP - phosphoserine
TPO - phosphothreonine
PTR - phosphotyrosine
MLY - methylated lysine

Nucleic acid modifications

Specify modified nucleotides using CCD codes. Format: chain:position:CCD_code. Examples include 5MC (5-methylcytosine) and PSU (pseudouridine).

Understanding the results

Protenix outputs predicted structures in CIF or PDB format with three confidence metrics.

pLDDT (predicted Local Distance Difference Test) scores each residue from 0-100. Scores above 90 indicate high confidence, 70-90 is confident, 50-70 is low confidence, and below 50 suggests disorder or poor prediction.

pTM (predicted TM-score) measures overall structure quality from 0-1. Values above 0.5 suggest the overall fold is correct.

ipTM (interface pTM) specifically evaluates multi-chain interfaces. For protein-ligand or protein-protein complexes, this metric indicates how reliably the interaction is predicted. Values above 0.7 generally indicate reliable interface predictions.

Best practices

When predicting protein-ligand complexes, include both the protein sequence and the ligand SMILES. Use pocket constraints if you know the approximate binding site from experimental data or prior docking studies with tools like AutoDock Vina or DiffDock.

For important predictions, we recommend using 2-3 model seeds to assess structural variability. If the top predictions agree, you can be more confident in the result.

For proteins over 1000 residues, consider starting with the Mini model to verify your input before running the full Base model.

Boltz-2 - Alternative AlphaFold 3-like structure prediction
Chai-1 - Multi-entity biomolecular structure prediction
OpenFold 3 - Open-source AlphaFold family implementation
ESMFold - Fast protein-only structure prediction using language models
PDB Viewer - Visualize your predicted structures
DiffDock - Diffusion-based molecular docking

Based on: ByteDance Research. Protenix — Advancing Structure Prediction Through a Comprehensive AlphaFold3 Reproduction. bioRxiv (2025). https://doi.org/10.1101/2025.01.08.631967