BoltzGen

Design protein and peptide binders against any biomolecular target using state-of-the-art generative AI. Create novel therapeutics with nanomolar-level binding affinity.

Input

Job name

Target protein

Click or drag files to upload (.pdb, .ent)

Target ligand

500 credits

Output

Configure input settings, then click "Submit"

What is BoltzGen?

BoltzGen is a generative AI model for designing protein and peptide binders against any biomolecular target. Developed by Hannes Stärk at MIT and released in November 2025, BoltzGen combines structure prediction and design into a single all-atom diffusion model that generates novel binders from scratch.

Traditional binder design methods modify existing sequences or use physics-based optimization. BoltzGen instead uses generative diffusion—the same technology behind image generation models like DALL-E—to sample entirely new binder sequences and structures. The model supports multiple binder modalities:

Peptides: Linear or cyclic peptides (8–30 residues) for small binding pockets
Proteins: Miniproteins (50–150 residues) for larger interfaces and complex geometries
Nanobodies: Single-domain antibody fragments (~130 residues) with natural stability
Fab antibodies: Larger therapeutic antibody fragments (~450 residues) with proven clinical scaffolds

BoltzGen was validated experimentally across 26 targets in eight academic and industry wetlab campaigns. On nine novel targets with less than 30% sequence similarity to any known bound structure, the model achieved a 66% success rate in producing nanomolar-affinity binders—substantially exceeding earlier generative design tools.

The model and code are released under the MIT license, enabling both academic research and commercial drug development.

How to use BoltzGen online

ProteinIQ provides a web-based interface for running BoltzGen without command-line installation. Users upload a target structure, select a binder modality, adjust design parameters, and receive ranked candidate binders with quality metrics.

Inputs

Input	Description
`Target protein`	The structure to design binders against. Upload a PDB/CIF file or enter an RCSB PDB ID (e.g., `8JJS`) to fetch directly. Structures up to ~500 residues work best.
`Target ligand`	For protein-small molecule mode only. Enter a SMILES string (e.g., `CCO`) or CCD code (e.g., `TSA`), or fetch from PubChem by compound ID.

Settings

Core settings

Setting	Description
`Protocol`	Binder modality. `Peptide` (8–30 residues, 1× cost), `Protein` (50–150 residues, 4.9× cost with refolding), `Nanobody` (~130 residues, 1× cost), `Fab` (~450 residues, 2.5× cost), or `Protein-small molecule` (6.8× cost).
`Fab scaffold`	For Fab mode. Select a specific FDA-approved antibody scaffold (adalimumab, dupilumab, etc.) or use `All` for diverse sampling across 14 therapeutic scaffolds.
`Number of designs`	Candidate binders to generate (10–100). Cost and runtime scale linearly. Start with 10–20 for exploration, 50–100 for production screening.
`Budget`	Final designs after diversity filtering. Typically equal to number of designs. Lower this to select only the most structurally diverse candidates.

Binder configuration

Setting	Description
`Uniform binder size`	When enabled, all designs use exact specified length (faster). When disabled, length varies within a min-max range for diversity (+15% runtime).
`Binder length`	Target residue count. Peptides: 8–30. Proteins: 50–150. Nanobodies: 110–130. Longer sequences increase compute cost quadratically.
`Binding site`	Optional. Residues where the binder should interact, format `chain:residues` (e.g., `A:12,14,61`). Leave empty for automatic binding site detection.
`Cyclic peptide`	Enable backbone cyclization (N- to C-terminus bond) for improved proteolytic stability. Only applies to peptides. Adds ~5% cost.
`Target chains`	Optional. Comma-separated chain IDs to include from multi-chain structures (e.g., `A,B`). Leave empty to use all chains.

Inverse folding

Setting	Description
`Skip inverse folding`	Skip sequence redesign step. Output backbone-only designs without optimized sequences. Faster but may produce less stable designs.
`Sequences per backbone`	Number of sequence variants per backbone (1–10). More sequences increase diversity but multiply compute time linearly.
`Avoid amino acids`	Single-letter codes to exclude from designed sequences. Common: `C` (prevents unwanted disulfides), `M` (oxidation-sensitive), `W` (synthesis issues).

Filtering and ranking

Setting	Description
`Quality vs diversity (alpha)`	Trade-off parameter (0.0–1.0). `0.0` = pure quality ranking. `1.0` = maximum diversity. Default `0.5` balances both.
`Filter biased compositions`	Remove designs with unusual amino acid frequencies (e.g., excessive alanines).
`Refolding RMSD threshold`	Maximum backbone deviation in Ångströms between designed and refolded structure (0.5–5.0). Lower values ensure designs fold as intended. Only applies to protein protocol.

Diffusion sampling

Setting	Description
`Step scale`	Diffusion step size (1.0–3.0, default 1.8). Higher values increase diversity but may reduce quality.
`Noise scale`	Noise level during generation (0.8–1.0, default 0.98). Lower values produce more deterministic outputs.
`Model checkpoint`	`Both` uses diverse and adherence models for maximum coverage. `Diverse` prioritizes novelty. `Adherence` prioritizes structural accuracy.

Structure constraints

Setting	Description
`Secondary structure`	Force specific secondary structure. Format: `chain:start-end:type` per line. Types: `HELIX`, `SHEET`, `LOOP`. Example: `B:1-5:HELIX`.
`Disulfide bonds`	Define cysteine-cysteine bridges. Format: `chain:residue,chain:residue` per line. Example: `B:3,B:12`.
`Staple bonds`	Non-natural crosslinks for peptide stabilization. Format: `chain:residue:atom,chain:residue:atom`. Common for i,i+4 or i,i+7 spacing.

Advanced design

Setting	Description
`Fixed sequence regions`	Lock positions to specific sequences. Format: `chain:start-end:sequence` per line. Example: `B:1-5:AAAAA`.
`Binding residues`	Binder positions that MUST contact the target. Format: `chain:residue1,residue2,...`.
`Non-binding residues`	Binder positions that must NOT contact the target. Useful for designing specificity.

Results

BoltzGen returns a ranked list of candidate binders with an interactive 3D viewer. Each design includes structure files (CIF format), sequences (FASTA), and quality metrics.

Column	Description
`Rank`	Position in quality-ranked list. Lower is better.
`Quality (pTM)`	Predicted TM-score (0–1). Measures structural quality of the binder-target complex. Higher is better.
`Error (Å)`	Predicted Aligned Error at the interface. Lower indicates more confident binding predictions.
`Interface (Å²)`	Buried surface area. Typical ranges: peptides 500–2,000 Å², proteins 1,000–3,000 Å², nanobodies 1,500–2,500 Å².
`Sequence`	Designed amino acid sequence in single-letter code.

Interpreting quality metrics

pTM ≥ 0.8: Excellent structural quality with high confidence. Prioritize for experimental synthesis.
pTM 0.6–0.8: Good quality suitable for most applications. Visual inspection recommended.
pTM < 0.6: Lower confidence. Consider generating additional designs or adjusting parameters.
PAE < 3 Å at interface: High-confidence binding prediction.
PAE 3–5 Å: Moderate confidence. Check interface geometry in 3D viewer.
PAE > 5 Å: Lower confidence or flexible regions. May indicate a challenging target.

How does BoltzGen work?

BoltzGen generates binders through an all-atom diffusion process guided by learned patterns from the Protein Data Bank. The approach combines several technical innovations that enable universal binder design.

All-atom diffusion model

Unlike coarse-grained methods that model only backbone atoms, BoltzGen represents every heavy atom in 3D space. This atomic-level precision enables design of specific side-chain interactions, disulfide bonds, and other features critical for binding specificity.

The generation process starts from random coordinates and iteratively refines them through learned denoising steps, guided by the target structure and any user-specified constraints. This generative approach naturally produces diverse binding modes without manual parameter tuning.

Geometry-based residue representation

BoltzGen encodes amino acid identity through geometry rather than discrete labels. Each residue is represented as a fixed set of virtual atoms: the backbone (N, Cα, C, O) plus atoms for the side chain. Amino acid type emerges from how these atoms position relative to the backbone—a technique more amenable to diffusion-based generation than mixing discrete and continuous variables.

Multi-task training

BoltzGen combines structure prediction and design in a single model through multi-task learning. During training, the model randomly performs three tasks—structure prediction, binder design, and structure completion—on the same data. This enables the model to leverage structural knowledge from the entire PDB for design, while design data improves prediction accuracy. Single-task models cannot achieve this synergy.

The training data was carefully curated for diversity. Upsampled antibody and TCR data that could bias toward common therapeutics was removed, and random cropping with multi-task processing was applied to all samples.

Two-checkpoint system

BoltzGen provides two model checkpoints optimized for different objectives:

Diverse checkpoint: Prioritizes exploration of binding modes and structural novelty
Adherence checkpoint: Prioritizes fidelity to specified constraints and structural accuracy

The default Both setting uses both checkpoints for maximum coverage of the design space.

Pipeline architecture

BoltzGen runs a seven-step pipeline:

Design: Diffusion model generates candidate backbones
Inverse folding: Sequence design using Protein-MPNN-style methods
Design folding: Refolds designed chain alone (protein protocol only)
Complex folding: Refolds full binder-target complex via Boltz-2
Affinity prediction: Binding strength estimation (small-molecule targets only)
Analysis: Multi-metric evaluation (RMSD, hydrophobic packing, contacts)
Filtering: Diversity-quality selection based on alpha parameter

Design specification language

BoltzGen supports flexible constraints that guide generation without retraining:

Covalent bonds: Cyclic peptides, disulfide bridges, staple crosslinks
Structure constraints: Fix backbone regions via distance constraints
Binding site targeting: Specify target residues the binder must contact
Secondary structure: Force helical, sheet, or coil regions
Fixed sequences: Lock specific positions to known residues
Design masks: Control which positions are designed vs. unchanged

These constraints steer the diffusion process toward specific design goals without slowing generation.

Limitations

Rigid target backbone: The target protein is treated as rigid during design. Induced-fit effects are not modeled.
No explicit solvent: Water molecules and explicit solvation effects are not considered.
Canonical amino acids only: Non-natural amino acids and post-translational modifications are not directly supported.
Computational requirements: Protein-length binders (50+ residues) require substantial GPU time. Cost scales quadratically with binder length.
Small-molecule mode validation: The protein-small molecule protocol has less experimental validation data than peptide and nanobody modes.
No guarantee of expressibility: High computational scores do not ensure successful recombinant expression or solubility.

Applications

Therapeutic binder development: Design peptide drugs, nanobody therapeutics, or antibody candidates against disease targets
Difficult target engagement: Access intrinsically disordered proteins, flat protein surfaces, and other "undruggable" targets
Protein-protein interaction modulators: Disrupt or stabilize specific protein interfaces
Biosensor development: Create recognition elements for diagnostic applications
Research tool generation: Design affinity reagents for pulldowns, co-IPs, and imaging

Boltz-2 — Structure prediction model used internally for refolding validation
RFDiffusion — Alternative scaffold-based protein design using diffusion
Protein-MPNN — Sequence optimization for designed backbones
DiffDock — Molecular docking using diffusion models
Chai-1 — Independent structure validation for designed complexes
PDB Fixer — Structure preparation for targets with missing residues or artifacts

BoltzGen

Input

Core settings

Binder

Inverse folding

Filtering & ranking

Diffusion sampling

Structure constraints

Advanced design

Execution control

Output

What is BoltzGen?

How to use BoltzGen online

Inputs

Settings

Core settings

Binder configuration

Inverse folding

Filtering and ranking

Diffusion sampling

Structure constraints

Advanced design

Results

Interpreting quality metrics

How does BoltzGen work?

All-atom diffusion model

Geometry-based residue representation

Multi-task training

Two-checkpoint system

Pipeline architecture

Design specification language

Limitations

Applications

Related tools

Input

Core settings

Binder

Inverse folding

Filtering & ranking

Diffusion sampling

Structure constraints

Advanced design

Execution control

Output