
BoltzGenBeta
Design protein and peptide binders against any biomolecular target using state-of-the-art generative AI. Create novel therapeutics with nanomolar-level binding affinity.
What is BoltzGen?
BoltzGen is a generative AI model for designing protein and peptide binders against any biomolecular target. Released in 2025 by Hannes Stärk at MIT, BoltzGen combines structure prediction and design into a single all-atom diffusion model that generates novel binders from scratch.
Traditional design methods modify existing sequences or use physics-based optimization. BoltzGen instead uses generative diffusion—the same technology behind DALL-E—to sample entirely new binder sequences and structures. This enables design of:
- Peptides (8-30 residues): Linear or cyclic peptides for small binding pockets
- Proteins (50-150 residues): Miniproteins and larger domains for complex interfaces
- Nanobodies (110-130 residues): Antibody-like fragments for therapeutic applications
BoltzGen works against diverse targets including proteins, nucleic acids, small molecules, and intrinsically disordered regions. This flexibility expands what can be targeted therapeutically—including surfaces traditional drug discovery couldn't access.
The model has been validated experimentally: 66% of tested designs bind their targets in wet-lab validation, substantially better than earlier generative models. This makes BoltzGen suitable for both computational screening and guiding experimental synthesis.
How does BoltzGen work?
BoltzGen uses a sophisticated all-atom diffusion model that generates binders through an iterative denoising process. The approach combines several technical innovations that enable universal binder design.
All-atom diffusion model
BoltzGen models every heavy atom in 3D space, not just backbone atoms or residue centers. This atomic-level precision enables design of specific side-chain interactions, disulfide bonds, and other features critical for binding specificity.
The generation process starts with random coordinates and iteratively refines them through learned diffusion steps, guided by the target structure and any constraints you specify. Unlike optimization-based methods, this generative approach naturally produces diverse binding modes without parameter tuning.
Geometry-based residue representation
BoltzGen encodes amino acid type through geometry rather than discrete labels. Each residue is represented as a fixed set of virtual atoms: the backbone (N, C-α, C, O) plus atoms for the side chain. Amino acid identity emerges from how these atoms position relative to the backbone—a technique that's more amenable to diffusion-based generation than mixing discrete and continuous variables.
Design specification language
BoltzGen supports multiple constraint types that guide generation without retraining. You can combine constraints flexibly:
- Covalent bonds: Cyclic peptides, disulfide bridges, or other crosslinks
- Structure constraints: Fix parts of the backbone via distance constraints
- Binding site targeting: Specify which target residues the binder must contact
- Secondary structure: Force helical, sheet, or coil regions in the binder
- Fixed sequences: Lock specific positions to known residues
- Design masks: Control which positions are designed vs. unchanged
These constraints steer the diffusion process toward your design goals without slowing generation.
Multi-task architecture
BoltzGen combines structure prediction and design in a single model. The architecture includes components from Boltz-2, with a backbone network (Trunk) and a diffusion module for generation.
The training approach matters: BoltzGen randomly performs three different tasks—structure prediction, binder design, and structure completion—on the same training data. This multi-task learning lets the model leverage structural knowledge from the entire PDB for design, while design data improves prediction accuracy. This synergy isn't possible with task-specific models.
The training data was carefully curated to maintain diversity. The team removed upsampled antibody and TCR data that could bias the model toward common therapeutics, and used random cropping and multi-task processing on all samples. This ensures generalization to novel protein architectures and binding modes beyond the training distribution.
Inputs & settings
BoltzGen requires a target structure and design parameters to generate binders.
Target structure
BoltzGen works with structures from any source: experimental structures (X-ray, cryo-EM) or computational predictions like AlphaFold. You can upload a PDB/CIF file or search by RCSB PDB ID. Most targets up to ~500 residues work well; larger complexes may need chain selection.
Clean structures perform better. If your structure has missing residues, unusual atoms, or water molecules, use PDB Fixer first.
Binding site specification (optional)
Specify a binding site when you have experimental evidence for a functional region: validated binding pockets from co-crystals, known allosteric sites for modulators, or protein-protein interfaces you want to disrupt. BoltzGen will focus generation on those residues, reducing computational cost and improving success rates.
Leave the binding site empty when exploring the entire target surface or when the target's function is unknown. This discovers novel binding modes but requires more compute.
For multi-chain complexes, use the target chains parameter to specify which chains to include in generation.
Output files
Each design comes with the 3D binder-target structure (CIF format), the designed sequence (FASTA), and metadata including quality metrics and interface analysis. Download designs for further validation via molecular dynamics, refinement, or experimental synthesis.
Protocol
Choose a protocol based on your binding interface size and available budget. The protocol affects binder length, computation cost, and which structure prediction methods apply.
Peptide-anything (8-30 residues): Fast, minimal cost (1× base). Good for small binding pockets or surface grooves. Can be linear or cyclic.
Nanobody-anything (110-130 residues): Antibody-like binder with natural stability. Access concave epitopes. Same cost as peptides (1×) despite length. Optimized for therapeutic expression.
Protein-anything (50-150 residues): Miniproteins for larger interfaces and higher specificity. Best for flat protein-protein interactions. Higher cost (4.9×) due to refolding validation.
Protein-small_molecule: Design protein binders against small molecules (SMILES or CCD codes). Most expensive (6.8×) due to refolding and affinity prediction. Experimental modality with less validation data than peptides/nanobodies.
Binder size
Uniform sizing (recommended) generates all designs at one exact length—faster and simpler. Variable length mode samples across a min-max range for diversity, at the cost of +15% compute time.
Choose size based on your binding interface: 8-20 for simple peptide pockets, 20-30 for larger peptide targets, 50-100 for miniproteins, 100-150 for larger proteins, and 110-130 for nanobodies. Longer sequences cost more to compute and require more validation. Start small and scale up.
Number of designs
Generate more candidates to increase chances of finding high-quality, diverse binders. For initial exploration, try 10-20 designs (5-15 min). For standard workflows, 20-40 designs (15-30 min). For production screening, 100+ designs (hours, higher cost).
Budget
The number of designs in your final result after filtering and ranking. Typically same as number of designs. Lower this if BoltzGen produces many similar candidates and you want only the most diverse subset.
Binding site (optional)
Format: chain:residue_numbers (e.g., A:12,14,61 or A:10-20,B:5). Use when you have experimental evidence for a functional site from co-crystals or mutagenesis. BoltzGen will focus on those residues, reducing compute and improving success rates.
Leave empty to explore the entire target surface for novel binding modes.
Target chains (optional)
For multi-chain complexes, specify which chains to include (e.g., A,B). Useful for removing crystallographic artifacts or focusing on specific subunits. Leave empty to include all chains.
Cyclic peptide
Enable backbone cyclization (N- to C-terminus bond) for improved proteolytic stability and often better binding affinity through conformational constraint. Only applies to peptides. Note that cyclization requires specialized synthesis and may restrict certain binding geometries. Enable for in vivo applications where stability is critical.
Understanding the results
BoltzGen ranks designs by quality. The table shows key metrics for each candidate.
Quality metrics
pTM (predicted TM-score): Ranges 0-1 (higher is better). Measures structural quality of the complex. Score = excellent quality with high confidence. Scores 0.6-0.8 = good quality suitable for most uses. Below 0.6 = lower confidence, requires validation.
PAE (Predicted Aligned Error): Measured in Ångströms (lower is better). Shows prediction uncertainty for atom pairs. = high confidence. = moderate confidence. = lower confidence or flexible regions. Low PAE at the interface indicates confident binding predictions.
BSA (Buried Surface Area): Interface size in (larger typically better). Typical ranges: peptides 500–2,000, proteins 1,000–3,000, nanobodies 1,500–2,500. Very small interfaces below 300 may be less stable. BSA alone doesn't determine affinity—shape complementarity matters most.
Cost
Pricing is based on actual GPU compute time. The cost scales with your protocol choice and total residue count (target + binder). We charge a minimum of 200 credits per job.
Related tools
For structure prediction on your designed binders, use Boltz-2 or Chai-1 for independent validation. For computational docking of designed binders to targets, try DiffDock or Smina. If you want to redesign existing structures or explore alternative binders, see Protein-MPNN for sequence optimization or RFDiffusion for scaffold-based design.
FAQs
How much does BoltzGen cost?
BoltzGen pricing scales with your protocol choice and total residue count. Peptide and nanobody designs cost the least (1× base cost). Proteins cost more (4.9×) due to refolding validation. Small molecule binders cost most (6.8×). We charge a minimum of 200 credits per job. New users receive free credits upon registration.
How long does a design generation take?
Peptide designs typically complete in 5–15 minutes for 10–20 candidates. Nanobodies take similar time (~10–20 min). Proteins take longer (15–60 min) due to refolding and validation steps. Small molecule designs are slowest (30–90 min). Runtime scales linearly with the number of designs requested.
How accurate are BoltzGen designs?
BoltzGen achieves 66% experimental validation success rate—designs that bind their targets in wet-lab testing. This substantially exceeds earlier generative design tools. Success depends on your target quality and constraints. Use high-confidence designs (pTM > 0.8) for experimental synthesis.
What's the difference between peptides, nanobodies, and proteins?
Peptides (8–30 residues) are smallest and fastest. Good for small binding pockets. Nanobodies (110–130 residues) are antibody-like with natural stability and access to concave epitopes. Proteins (50–150 residues) are for larger interfaces and complex geometries. Choose based on your binding interface size and therapeutic application.
Can BoltzGen handle difficult targets?
Yes. If standard generation underperforms, try: (1) specifying a binding site from experimental data, (2) increasing the number of designs for better sampling, (3) using higher MSA depth or longer sequences for nanobodies, (4) enabling variable-length mode for diversity. Start simple and increase complexity if needed.
What should I do with designed structures after generation?
Validate designs through: (1) structural prediction with Boltz-2 for independent confirmation, (2) molecular dynamics simulation to check stability, (3) docking with DiffDock or Smina for alternative binding modes, (4) sequence optimization with Protein-MPNN for alternative sequences. Then prioritize for experimental synthesis.
Can I combine multiple constraints?
Yes. BoltzGen supports simultaneous constraints: binding site targeting, disulfide bonds, cyclic topology, secondary structure, and fixed sequences. Combine constraints to steer design toward your objectives. More constraints reduce sampling space but improve success on known features.
What's the difference between BoltzGen and RFDiffusion?
RFDiffusion designs protein scaffolds around motifs or sequences you provide. BoltzGen generates binders de novo against target structures without requiring input scaffolds. BoltzGen predicts binding to specific targets; RFDiffusion is more general-purpose structure design. Use BoltzGen when you have a target; use RFDiffusion for scaffold-based or motif-driven design.
Can BoltzGen design binders for small molecules?
Yes, via the protein-small_molecule protocol. Provide your ligand as a SMILES string or PubChem ID. This is the most expensive modality (6.8× cost) and newer than peptide/nanobody design, so expect less validation data. Works best for common functional groups; unusual chemistry may be less reliable.
How do I know if a design is worth synthesizing?
Use this ranking: (1) high pTM (> 0.8) indicates high structural confidence, (2) low PAE at the interface shows confident binding prediction, (3) BSA within normal range (500–2,500 Ų) suggests productive interactions, (4) sequence quality—avoid biased compositions, unusual amino acids, or suspicious patterns. Synthesize top 3–5 ranked candidates with orthogonal validation.
Can I use existing binder structures as templates?
BoltzGen generates from scratch, not from templates. But you can validate designs by comparing them to known binders structurally or by using Protein-MPNN to extract sequences from successful binders and refine them. For template-based design, consider RFDiffusion instead.
What formats do designed structures come in?
Each design is delivered as a CIF file (3D structure), FASTA sequence, and JSON metadata (metrics, interface analysis). You can download all designs together or individually. Use the CIF files in molecular visualization tools, MD simulations, or analysis pipelines.