Click to upload (.pdb, .ent)
BoltzGen is a generative AI model for designing protein and peptide binders against any biomolecular target. Developed by Hannes Stärk at MIT and released in November 2025, BoltzGen combines structure prediction and design into a single all-atom diffusion model that generates novel binders from scratch.
Traditional binder design methods modify existing sequences or use physics-based optimization. BoltzGen instead uses generative diffusion—the same technology behind image generation models like DALL-E—to sample entirely new binder sequences and structures. The model supports multiple binder modalities:
BoltzGen was validated experimentally across 26 targets in eight academic and industry wetlab campaigns. On nine novel targets with less than 30% sequence similarity to any known bound structure, the model achieved a 66% success rate in producing nanomolar-affinity binders—substantially exceeding earlier generative design tools.
The model and code are released under the MIT license, enabling both academic research and commercial drug development.
ProteinIQ provides a web-based interface for running BoltzGen without command-line installation. Users upload a target structure, select a binder modality, adjust design parameters, and receive ranked candidate binders with quality metrics.
| Input | Description |
|---|---|
Target protein | The structure to design binders against. Upload a PDB/CIF file or enter an RCSB PDB ID (e.g., 8JJS) to fetch directly. Structures up to ~500 residues work best. |
Target ligand | For protein-small molecule mode only. Enter a SMILES string (e.g., CCO) or CCD code (e.g., TSA), or fetch from PubChem by compound ID. |
| Setting | Description |
|---|---|
Protocol | Binder modality. Peptide (8–30 residues, 1× cost), Protein (50–150 residues, 4.9× cost with refolding), Nanobody (~130 residues, 1× cost), Fab (~450 residues, 2.5× cost), or Protein-small molecule (6.8× cost). |
Fab scaffold | For Fab mode. Select a specific FDA-approved antibody scaffold (adalimumab, dupilumab, etc.) or use All for diverse sampling across 14 therapeutic scaffolds. |
Number of designs | Candidate binders to generate (10–100). Cost and runtime scale linearly. Start with 10–20 for exploration, 50–100 for production screening. |
Budget | Final designs after diversity filtering. Typically equal to number of designs. Lower this to select only the most structurally diverse candidates. |
| Setting | Description |
|---|---|
Uniform binder size | When enabled, all designs use exact specified length (faster). When disabled, length varies within a min-max range for diversity (+15% runtime). |
Binder length | Target residue count. Peptides: 8–30. Proteins: 50–150. Nanobodies: 110–130. Longer sequences increase compute cost quadratically. |
Binding site | Optional. Residues where the binder should interact, format chain:residues (e.g., A:12,14,61). Leave empty for automatic binding site detection. |
Cyclic peptide | Enable backbone cyclization (N- to C-terminus bond) for improved proteolytic stability. Only applies to peptides. Adds ~5% cost. |
Target chains | Optional. Comma-separated chain IDs to include from multi-chain structures (e.g., A,B). Leave empty to use all chains. |
| Setting | Description |
|---|---|
Skip inverse folding | Skip sequence redesign step. Output backbone-only designs without optimized sequences. Faster but may produce less stable designs. |
Sequences per backbone | Number of sequence variants per backbone (1–10). More sequences increase diversity but multiply compute time linearly. |
Avoid amino acids | Single-letter codes to exclude from designed sequences. Common: C (prevents unwanted disulfides), M (oxidation-sensitive), W (synthesis issues). |
| Setting | Description |
|---|---|
Quality vs diversity (alpha) | Trade-off parameter (0.0–1.0). 0.0 = pure quality ranking. 1.0 = maximum diversity. Default 0.5 balances both. |
Filter biased compositions | Remove designs with unusual amino acid frequencies (e.g., excessive alanines). |
Refolding RMSD threshold | Maximum backbone deviation in Ångströms between designed and refolded structure (0.5–5.0). Lower values ensure designs fold as intended. Only applies to protein protocol. |
| Setting | Description |
|---|---|
Step scale | Diffusion step size (1.0–3.0, default 1.8). Higher values increase diversity but may reduce quality. |
Noise scale | Noise level during generation (0.8–1.0, default 0.98). Lower values produce more deterministic outputs. |
Model checkpoint | Both uses diverse and adherence models for maximum coverage. Diverse prioritizes novelty. Adherence prioritizes structural accuracy. |
| Setting | Description |
|---|---|
Secondary structure | Force specific secondary structure. Format: chain:start-end:type per line. Types: HELIX, SHEET, LOOP. Example: B:1-5:HELIX. |
Disulfide bonds | Define cysteine-cysteine bridges. Format: chain:residue,chain:residue per line. Example: B:3,B:12. |
Staple bonds | Non-natural crosslinks for peptide stabilization. Format: chain:residue:atom,chain:residue:atom. Common for i,i+4 or i,i+7 spacing. |
| Setting | Description |
|---|---|
Fixed sequence regions | Lock positions to specific sequences. Format: chain:start-end:sequence per line. Example: B:1-5:AAAAA. |
Binding residues | Binder positions that MUST contact the target. Format: chain:residue1,residue2,.... |
Non-binding residues | Binder positions that must NOT contact the target. Useful for designing specificity. |
BoltzGen returns a ranked list of candidate binders with an interactive 3D viewer. Each design includes structure files (CIF format), sequences (FASTA), and quality metrics.
| Column | Description |
|---|---|
Rank | Position in quality-ranked list. Lower is better. |
Quality (pTM) | Predicted TM-score (0–1). Measures structural quality of the binder-target complex. Higher is better. |
Error (Å) | Predicted Aligned Error at the interface. Lower indicates more confident binding predictions. |
Interface (Ų) | Buried surface area. Typical ranges: peptides 500–2,000 Ų, proteins 1,000–3,000 Ų, nanobodies 1,500–2,500 Ų. |
Sequence | Designed amino acid sequence in single-letter code. |
BoltzGen generates binders through an all-atom diffusion process guided by learned patterns from the Protein Data Bank. The approach combines several technical innovations that enable universal binder design.
Unlike coarse-grained methods that model only backbone atoms, BoltzGen represents every heavy atom in 3D space. This atomic-level precision enables design of specific side-chain interactions, disulfide bonds, and other features critical for binding specificity.
The generation process starts from random coordinates and iteratively refines them through learned denoising steps, guided by the target structure and any user-specified constraints. This generative approach naturally produces diverse binding modes without manual parameter tuning.
BoltzGen encodes amino acid identity through geometry rather than discrete labels. Each residue is represented as a fixed set of virtual atoms: the backbone (N, Cα, C, O) plus atoms for the side chain. Amino acid type emerges from how these atoms position relative to the backbone—a technique more amenable to diffusion-based generation than mixing discrete and continuous variables.
BoltzGen combines structure prediction and design in a single model through multi-task learning. During training, the model randomly performs three tasks—structure prediction, binder design, and structure completion—on the same data. This enables the model to leverage structural knowledge from the entire PDB for design, while design data improves prediction accuracy. Single-task models cannot achieve this synergy.
The training data was carefully curated for diversity. Upsampled antibody and TCR data that could bias toward common therapeutics was removed, and random cropping with multi-task processing was applied to all samples.
BoltzGen provides two model checkpoints optimized for different objectives:
The default Both setting uses both checkpoints for maximum coverage of the design space.
BoltzGen runs a seven-step pipeline:
BoltzGen supports flexible constraints that guide generation without retraining:
These constraints steer the diffusion process toward specific design goals without slowing generation.