
All-atom generative diffusion for designing binders, enzymes, and symmetric protein assemblies
RFdiffusion3 (RFD3) is a generative diffusion model for de novo protein design. Unlike structure prediction tools that fold existing sequences, RFD3 creates entirely new protein backbones through an iterative denoising process. The model can generate proteins from scratch or design new chains that interact with specified targets.
RFD3 extends the original RFdiffusion architecture with all-atom capabilities, enabling design tasks involving small molecules, metals, and other non-protein components. The model learns to reverse a diffusion process that gradually adds noise to protein structures, generating novel backbones that satisfy specified constraints.
For sequence design after generating backbones, pair RFD3 with LigandMPNN to optimize amino acid sequences. To validate designed structures, use RosettaFold3 or Boltz-2 for structure prediction.
RFD3 operates on a diffusion framework where protein structures are progressively corrupted with Gaussian noise during training. At inference time, the model reverses this process:
The number of diffusion steps controls the quality-speed tradeoff. More steps (100-200) produce higher quality structures but take longer, while fewer steps (10-50) enable rapid prototyping.
RFD3 can incorporate various constraints during generation:
Structural constraints: Fix specific residue positions from a template structure while designing new regions around them. The contig specification language allows precise control over which regions are fixed versus designed.
Hotspot targeting: For binder design, specify which target residue atoms should be contacted by the designed protein. The model orients the new chain to maximize interactions with these hotspots.
Length constraints: Control the size of designed proteins or protein regions using length ranges. The model samples within the specified range during generation.
Provide a target protein structure for constrained design tasks like binder design. Upload a PDB/CIF file or fetch directly from RCSB using a PDB ID (e.g., 4ZXB). The target structure defines the binding interface for binder design or provides structural motifs for scaffolding tasks.
For unconditional generation (creating proteins from scratch), leave this empty and RFD3 will generate novel folds based solely on the specified length.
Select the type of protein design:
Generate multiple independent designs (1-50) in a single job. Each design samples a different trajectory through the diffusion process, producing structural diversity. More designs increase the chance of finding high-quality candidates but require more computation time.
Controls the number of denoising iterations (10-200). Higher values produce more refined structures:
| Steps | Use Case |
|---|---|
| 10-20 | Quick prototyping, initial exploration |
| 50 | Standard quality (default) |
| 100-200 | High-quality final designs |
Specify the length of designed protein regions as a range (e.g., 50-100). The model samples within this range during generation. For binder design, this controls the size of the designed binder chain.
The contig specification provides fine-grained control over the design:
Format: new_chain_length,/0,target_chain_residues
Examples:
50-100,/0,A1-95 - Design 50-100 residue binder for chain A residues 1-9540-80,/0,B10-150 - Design binder for chain B residues 10-150100 - Generate 100-residue protein unconditionallyThe /0 separator indicates a chain break between the designed region and the fixed target.
Target specific atoms on the target protein for binder design. This focuses the designed interface on functionally important residues.
Format: JSON object mapping residue IDs to atom names
{
"A64": "CD2,CZ",
"A88": "CG,CZ",
"A96": "CD1,CZ"
}
Residue ID format: ChainResidueNumber (e.g., A64 = chain A, residue 64)
Common atom names:
N, CA, C, OCD1, CD2, CE1, CE2, CZCB, CG, CDNZ (Lys), OD1/OD2 (Asp), OE1/OE2 (Glu)Hotspot selection dramatically improves binder design success rates by ensuring the designed protein contacts critical interface residues.
Controls how the designed chain is positioned relative to the target:
Each design generates a PDB file containing the designed backbone coordinates. Files are named sequentially (e.g., design_0.pdb, design_1.pdb). The structures contain:
RFD3 outputs are backbone-only structures requiring sequence design:
While RFD3 doesn't provide explicit confidence scores like folding tools, design quality can be assessed by:
RFD3 excels at:
Based on: Watson JL, Juergens D, Bennett NR, et al. (2023). De novo design of protein structure and function with RFdiffusion. Nature. DOI: 10.1038/s41586-023-06415-8