
RFdiffusion
AI-powered protein structure design for de novo generation, binder design, motif scaffolding, and symmetric oligomers
RFdiffusion: Protein Structure Design
What is RFdiffusion?
RFdiffusion is a groundbreaking protein design tool developed at the University of Washington's Baker Lab that uses diffusion models to generate protein structures with atomic precision. Published in Nature (2023), RFdiffusion represents a major advance in computational protein design, enabling the creation of proteins with specific functions, binding properties, and structural features that don't exist in nature.
Unlike traditional structure prediction tools like AlphaFold that predict structures from sequences, RFdiffusion works in reverse—it designs entirely new protein backbones from scratch or scaffolds specific structural motifs to create functional proteins.
Key capabilities
RFdiffusion supports five design modes, each tailored for specific protein engineering tasks:
1. Binder design
Design proteins that bind to specific target proteins with high affinity. RFdiffusion can create binders for therapeutic targets, biosensors, or protein-protein interaction modulators. This mode is particularly powerful for designing peptide binders, protein therapeutics, and diagnostic tools.
2. Motif scaffolding
Embed functional motifs (enzyme active sites, binding sites) into novel protein scaffolds. This enables engineering proteins with specific catalytic activities or binding properties while maintaining structural stability and solubility.
3. Partial diffusion
Redesign specific regions of existing proteins while preserving the rest of the structure. Useful for protein optimization, stabilization, or introducing new functionalities into known scaffolds.
4. Unconditional protein generation
Generate entirely new protein structures from scratch without constraints. Create novel folds, design symmetric oligomers (cyclic, dihedral, tetrahedral, octahedral, icosahedral), or explore uncharted regions of protein structure space.
5. Custom design
Advanced mode for expert users who want full control over the diffusion process using custom contig specifications. Enables complex multi-domain designs, flexible motif positioning, and sophisticated sequence inpainting.
How does RFdiffusion work?
RFdiffusion applies the same diffusion model technology behind AI image generators to protein structure design. Instead of pixels, it operates on protein backbone torsion angles and atomic coordinates.
Diffusion on SE(3) manifold
The model defines a forward diffusion process that gradually adds noise to protein backbone coordinates and orientations, transforming any structure into random noise. The reverse process is learned by a neural network that denoises random structures into valid protein backbones through iterative refinement steps.
RoseTTAFold architecture
RFdiffusion builds on RoseTTAFold, combining 1D sequence processing, 2D distance map prediction, and 3D structure refinement in a three-track neural network. The model uses SE(3)-equivariant layers that respect the rotational and translational symmetries of 3D space, ensuring physically realistic outputs.
Self-conditioning
A key innovation is self-conditioning: at each diffusion step, the model conditions on its own predictions from the previous step, progressively refining the structure with greater certainty. This dramatically improves sample quality and reduces required diffusion steps from 200 to 50.
Guided diffusion with potentials
Optional guiding potentials can bias the diffusion process toward desired properties—compact structures, high contact density, or specific binding interfaces. These potentials act as soft constraints that nudge the generative process without breaking the diffusion framework.
Design modes explained
Binder design mode
Creates proteins that bind to specified target regions on a protein of interest.
Key parameters:
- Target chain: Which chain to design a binder for
- Binding pocket: Crop region around the desired binding site (residue range)
- Binder length: Minimum and maximum size of the designed binder
- Hotspots: Specific residues that should be involved in binding (biases diffusion)
Tips for success:
- Cropping the target around the binding site significantly speeds up computation
- Define hotspots to avoid designing binders for exposed hydrophobic patches (artifacts of cropping)
- Start with 10-20 residue binders; longer binders are harder to express and less stable
- The cyclic chains option enables macrocyclic peptide binders with enhanced stability
Motif scaffolding mode
Builds protein scaffolds around functional motifs (catalytic sites, binding loops).
Key parameters:
- Motif chain: Chain containing the motif to scaffold
- N/C-terminal extensions: How much to extend the motif on each end
- Scaffold range: Which residues of the input to preserve as scaffold
Use cases:
- Transplanting enzyme active sites into novel scaffolds
- Stabilizing functional loops by embedding them in rigid protein frameworks
- Creating de novo enzymes with specified catalytic geometries
Partial diffusion mode
Redesigns specific regions while keeping the rest of the structure fixed.
Key parameters:
- Partial diffusion chain: Which chain to modify
- Diffused residue range: Which residues to redesign (rest are fixed)
- Partial timesteps: Amount of noise to add (lower = more conservative, higher = more creative)
Applications:
- Protein stabilization by redesigning flexible loops
- Interface engineering without disrupting the core fold
- Introducing functional mutations in defined regions
Unconditional generation mode
Generates entirely new protein structures with specified length and symmetry.
Key parameters:
- Protein length: Size of the monomeric unit (10-500 residues recommended)
- Symmetry: None (monomer), cyclic, dihedral, tetrahedral, octahedral, icosahedral
- Order: Number of copies for cyclic/dihedral symmetries
Achievements:
- Novel protein folds not found in nature
- Symmetric nanocages for drug delivery
- Artificial enzymes with designed active sites
- Structural proteins with enhanced stability
Custom design mode
For advanced users familiar with RFdiffusion's contig syntax.
Contig format examples:
150-150: Generate 150-residue proteinA10-25/30-40: Use chain A residues 10-25, then design 30-40 new residuesB1-100/0 100-100: Full chain B, gap, then 100 new residues
Enables complex designs like multi-domain proteins, flexible motif positioning, and partial sequence inpainting.
Input requirements
Required inputs
- For binder/scaffolding/partial diffusion: PDB file with chain IDs (upload or PDB ID)
- For unconditional generation: No input required (de novo design)
- For custom mode: Depends on contig specification
PDB preparation
- Ensure chain IDs are properly assigned
- Remove water molecules unless critical for motif function
- Clean structure of missing residues (or note them for diffusion to fill in)
- Use PDB Fixer for automated preparation
Understanding the results
RFdiffusion outputs designed protein backbones as PDB files. Each design is scored based on confidence metrics.
Interpreting scores
- Higher scores indicate higher confidence in the design
- Scores are model-predicted estimates of designability and stability
- Top-ranked designs aren't always the best—examine multiple outputs
Next steps after RFdiffusion
RFdiffusion only generates backbones. For functional proteins, you typically:
- Sequence design: Use ProteinMPNN to design amino acid sequences for the backbone
- Structure prediction: Validate with AlphaFold2 to ensure sequence folds correctly
- Experimental validation: Express, purify, and characterize the designed protein
ProteinIQ can automate steps 1-2 with the "Backbone only" toggle disabled (default).
Best practices
Timesteps
- Default (50 steps): Good balance of quality and speed
- Lower (20-30): Faster but lower quality—acceptable for rapid prototyping
- Higher (100-200): Marginally better quality but 2-4× slower—rarely necessary
Binder design
- Start with 10-20 residue peptide binders before attempting protein binders
- Crop the target protein around the binding site for 5-10× speedup
- Use hotspots to guide binders toward specific epitopes
- Consider cyclic peptides for enhanced stability and binding affinity
Motif scaffolding
- Keep motifs compact (fewer than 20 residues) for better success rates
- Substrate contacts potential helps maintain binding site geometry
- Validate scaffold stability with AlphaFold2 before synthesis
Symmetry design
- Start with low symmetry orders (C2-C3, D2-D3) before attempting complex geometries
- Higher symmetries (icosahedral) require more timesteps (100+)
- Oligomer contacts potential stabilizes multimeric interfaces
Guiding potentials
- Use sparingly—start without potentials, then add if needed
- Monomer ROG compacts structures (useful for small domains)
- Contact potentials increase stability but may reduce structural diversity
- Not all potentials work with all modes—check tooltips for compatibility
Limitations
- No sequence design: RFdiffusion only generates backbones; use ProteinMPNN for sequences
- Rigid backbone assumption: Doesn't model conformational flexibility during design
- No small molecule support: Can't directly incorporate ligands (yet—use V2 for this)
- Computational cost: Larger proteins (>300 residues) scale quadratically in memory
- Experimental success rate: Not all designs fold correctly—validation with AlphaFold2 recommended
Comparison: RFdiffusion vs AlphaFold
RFdiffusion (Design)
- Creates new proteins: Generates structures that don't exist
- Backward direction: Structure → sequence
- Applications: Therapeutics, enzymes, materials
- Output: Backbone coordinates (+ sequences via ProteinMPNN)
AlphaFold (Prediction)
- Predicts existing proteins: Folds sequences from nature
- Forward direction: Sequence → structure
- Applications: Structural biology, function annotation
- Output: Atomic coordinates with confidence scores
Think of RFdiffusion as the "protein designer" and AlphaFold as the "protein validator."
Cost
Using RFdiffusion through ProteinIQ costs 150 credits per design job, regardless of the number of designs generated.
References
- Watson, J.L., Juergens, D., Bennett, N.R. et al. (2023). De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100. https://doi.org/10.1038/s41586-023-06415-8
- Yim, J., Trippe, B.L., De Bortoli, V. et al. (2023). SE(3) diffusion model with application to protein backbone generation. ICML 2023. https://arxiv.org/abs/2302.02277
- Official GitHub: https://github.com/RosettaCommons/RFdiffusion