Configure inputs to begin
Set options on the left, then click “Submit job” — or start from an example.
Generate 10 novel sequences (100 residues)
Generate longer sequences (200 residues)
Scaffold a motif with 100-150 additional residues

Generate protein structures and scaffolds with Genie 3, an all-atom SE(3)-equivariant diffusion model. Genie 3 supports unconditional protein generation, motif scaffolding, and hotspot-targeted binder design.

All-atom generative AI for designing protein binders. Specify target binding sites and generate diverse binding proteins with fine-grained control over interaction parameters.

PocketFlow is a structure-based molecular generative model that designs novel drug-like molecules within protein binding pockets. It uses autoregressive flow modeling with chemical knowledge to generate 100% chemically valid, highly drug-like compounds.

PocketXMol is a pocket-interacting generative foundation model for docking, small-molecule design, and peptide design in protein binding pockets.

ProGen2 is Salesforce Research's protein language model suite for prompt-based de novo protein sequence generation. It samples novel amino acid sequences from a plain-text context string using top-p sampling and temperature control.

Reasoning-guided antibody CDR co-design for antibody-antigen complexes. Proteo-R1 identifies residue-level functional decisions and uses conditional diffusion to generate ranked designed structures with confidence metrics.

RFdiffusion is a state-of-the-art protein structure generation tool that uses diffusion models to design proteins de novo, create binders, scaffold motifs, and generate symmetric oligomers with atomic precision.

RFdiffusion2 is an atom-level enzyme active site scaffolding tool that generates protein scaffolds around your input motif. REQUIRES an input PDB structure containing the active site residues to scaffold. For ligand-aware design, ligands must be embedded in the input PDB as HETATM records.

BoltzGen is a state-of-the-art AI model for designing protein and peptide binders against any biomolecular target. Using generative diffusion models, it creates novel binders (proteins, peptides, nanobodies) with nanomolar-level binding affinity.

PepMimic designs short peptides that mimic the binding interface of a known protein binder on its target. From a reference protein complex, a latent diffusion model generates peptide candidates constrained to the target interface, and each candidate is scored by interface-mimicry against the reference binder.
EvoDiff is a diffusion-based protein sequence generation framework from Microsoft Research that generates novel protein sequences directly in sequence space. Unlike structure-based methods like RFdiffusion that design 3D backbone coordinates first, EvoDiff works entirely with amino acid sequences—no structural intermediate required.
ProteinIQ currently wraps the EvoDiff-Seq OA_DM_38M model. In this tool you can:
this tool does not currently support EvoDiff-MSA generation, family-conditioned MSA modes, or the OmegaFold/TMscore post-processing pipeline used in the native scaffold benchmark scripts.
Published as a preprint in September 2023 and open-sourced by Microsoft, EvoDiff represents a fundamentally different approach to protein design. While AlphaFold and RFdiffusion revolutionized structure prediction and structure-based design, EvoDiff demonstrates that "sequence is all you need" for generating novel, structurally plausible proteins.
Traditional diffusion models like DALL-E work by adding noise to images, then learning to reverse the process. EvoDiff adapts this concept for discrete amino acid sequences through two distinct corruption schemes:
Order-Agnostic Autoregressive Diffusion (OADM): At each forward step, one amino acid is replaced with a special mask token. After steps (where is sequence length), the entire sequence is masked. The reverse process learns to unmask residues in any order—not left-to-right like traditional language models—allowing the model to consider global sequence context when generating each position.
Discrete Denoising Diffusion (D3PM): The forward process corrupts sequences by sampling mutations according to a transition matrix. Two variants exist:
After steps, the corrupted sequence becomes indistinguishable from random amino acids. The model learns to reverse this corruption, recovering structured sequences from noise.
EvoDiff uses a dilated convolutional neural network architecture adapted from the CARP protein masked language model. This architecture efficiently captures long-range dependencies in protein sequences while maintaining computational tractability. The model processes sequences as discrete tokens (20 standard amino acids plus special tokens).
Training used 42 million sequences from UniRef50, a clustered subset of UniProt representing diverse protein families across all domains of life. Two model sizes are available:
The ProteinIQ implementation uses the 38M-parameter OADM model (EvoDiff-Seq), which benchmarks showed outperforms D3PM variants for unconditional generation.
The original repository also ships EvoDiff-MSA models, D3PM sequence models, CARP baselines, and LRAR baselines. Those modes are useful, but they are not yet available in the ProteinIQ tool. The current app is intentionally narrower: one EvoDiff-Seq model, one unconditional mode, one scaffold mode, and one custom user-sequence inpainting mode.
Structure-based design methods require high-quality structural templates—they cannot design intrinsically disordered proteins, linker regions, or proteins lacking structural homologs. EvoDiff's sequence-first approach:
Generates novel protein sequences from scratch without any template. You specify the desired length (50-512 residues) and number of samples. The model samples from learned sequence distributions, producing diverse sequences with natural amino acid composition and secondary structure propensities.
When to use: Exploring novel sequence space, generating diverse protein libraries, discovering new folds, or creating synthetic proteins for experimental screening.
Example: Generate 10 sequences of 100 residues each to create a diverse starting library for directed evolution experiments.
Builds a complete protein sequence around a structural motif from an input PDB file. You specify which residues comprise the functional motif and how many additional scaffold residues EvoDiff should generate outside that fixed motif. The tool follows the native EvoDiff scaffold semantics:
motif_start_idx and motif_end_idx are 0-indexed and inclusive,scaffold_min / scaffold_max count only the generated residues outside the motif.When to use: Transplanting binding sites, epitopes, or catalytic residues into new sequence contexts. Useful when you have a functional motif and want to explore alternative scaffolds that might improve stability, expression, or other properties.
Input requirements: PDB file containing the motif structure. Specify the motif range using EvoDiff-compatible 0-indexed inclusive positions and a scaffold length range measured in added residues, not total sequence length.
Fills in specified regions of an existing protein sequence while preserving the rest. You provide the full sequence and indicate which positions to regenerate (as comma-separated ranges like "10-25,50-60"). The tool then applies the same order-agnostic diffusion-style unmasking loop used by the native OADM implementation to those user-selected positions.
This is a ProteinIQ adaptation of the native inpainting routine. The --cond-task idr CLI is benchmark-oriented and operates on a built-in IDR dataset rather than arbitrary user-supplied sequences.
When to use: Redesigning disordered regions, optimizing problematic loop sequences, replacing aggregation-prone segments, or introducing variation while maintaining framework regions.
Example: For antibody engineering, mask CDR regions while preserving framework residues to generate diverse binding variants.
Design mode: Selects the generation task. Unconditional Generation creates sequences from scratch. Motif Scaffolding requires a PDB structure. Sequence Inpainting requires an input sequence.
Number of sequences: How many independent sequences to generate (1-50). More sequences provide better coverage of sequence space. Start with 10 for initial exploration, increase to 30-50 for comprehensive sampling.
Sequence length: Target length in residues for unconditional generation (50-512). Longer sequences increase computational time but enable designing larger proteins. The 512-residue limit reflects model training constraints.
Motif start/end index: Defines the functional motif region in the extracted chain sequence using native EvoDiff semantics: 0-indexed and inclusive. The motif sequence is preserved exactly; surrounding residues are generated.
Minimum/maximum additional scaffold residues: Number of generated residues outside the fixed motif. Final sequence length equals motif length plus the sampled scaffold length.
Positions to regenerate: Comma-separated residue ranges to mask and regenerate. Format: "10-25,50-60" regenerates positions 10-25 and 50-60 while preserving all other positions. Use 1-indexed positions matching your input sequence.
EvoDiff outputs FASTA-formatted protein sequences. For scaffold and inpainting jobs, ProteinIQ also returns structured metadata describing each generated sequence so you can trace motif placement or masked ranges without re-parsing the FASTA manually. Unlike structure prediction tools, no confidence scores are directly provided—the model generates plausible sequences but cannot guarantee they will fold or function.
The current tool intentionally leaves several native capabilities for a later pass:
OA_DM_38MThose omissions do not change the scientific behavior of the modes already available; they are simply not surfaced yet.
Sequence properties: Check amino acid composition, isoelectric point, and instability index using ProteinIQ analysis tools. Generated sequences should show natural-like properties unless specifically designed otherwise.
Structure prediction: Validate designs with ESMFold, AlphaFold 2, or Chai-1. High pLDDT scores (>70) suggest the sequence encodes a well-defined structure. Low pLDDT may indicate disordered regions or problematic sequences.
Sequence identity: Compare generated sequences to natural proteins using BLAST or MMseqs2. Novel sequences typically show less than 30% identity to known proteins—higher identity suggests the model recovered existing sequences rather than generating novel ones.
Generated sequences should exhibit:
Generate entirely novel proteins without evolutionary or structural templates. EvoDiff samples from learned sequence distributions to create proteins with natural-like properties. Combine with structure prediction (ESMFold, AlphaFold 2) to identify well-folded candidates, then use ProteinMPNN for sequence optimization if needed.
Workflow: EvoDiff (sequence generation) → ESMFold (structure prediction) → Filter by pLDDT → ProteinMPNN (sequence refinement) → Experimental validation
Structure-based methods like RFdiffusion cannot design intrinsically disordered proteins or linker regions. EvoDiff's sequence-first approach handles these naturally. Use inpainting mode to redesign disordered loops while preserving structured domains.
Scaffold mode enables transplanting binding sites, catalytic residues, or epitopes into new sequence contexts. This can improve protein properties (stability, expression, solubility) while maintaining function, or explore how different scaffolds affect motif conformation.
Generate diverse protein libraries for experimental screening. Unlike random mutagenesis, EvoDiff produces sequences that respect evolutionary constraints—mutations are more likely to produce folded, functional proteins.
| Method | Input | Output | Best for |
|---|---|---|---|
| EvoDiff | None/sequence/PDB | Sequences | Novel sequences, IDRs, linkers |
| RFdiffusion | None/PDB | Structures | Binders, scaffolds, oligomers |
| ProteinMPNN | PDB structure | Sequences | Inverse folding, redesign |
| ESM-IF1 | PDB structure | Sequences | Fast inverse folding |
| ProGen2 | Context token | Sequences | Broad protein language-model generation |
| ProFam | Family context | Sequences | Protein-family-conditioned generation |
Use EvoDiff when: You need novel sequences without structural constraints, want to design disordered regions, or lack a suitable structural template.
Use RFdiffusion when: You need precise structural control, are designing binders, or require symmetric oligomers.
Use ProteinMPNN when: You have a target structure and need optimized sequences to fold into it.
Maximum sequence length is 512 residues. Longer proteins require splitting into domains or using alternative methods. Generation time scales with sequence length and number of samples.
EvoDiff generates sequences without explicit structural constraints. Generated sequences are statistically plausible but not guaranteed to fold or adopt specific conformations. Always validate with structure prediction before experimental work.
The model reflects biases in UniRef50—underrepresented protein families may generate lower-quality sequences. Designed sequences may show composition biases toward well-represented families.
Unlike structure-based methods, EvoDiff cannot explicitly design around ligands, metals, or cofactors. For binding site design, use RFdiffusion scaffolding or LigandMPNN.
RFdiffusion operates in structure space, generating 3D backbone coordinates that are then sequenced with ProteinMPNN. EvoDiff works directly in sequence space—it generates amino acid sequences without any structural intermediate. This makes EvoDiff uniquely capable of designing intrinsically disordered proteins, linkers, and other sequences without defined structures.
Not directly. EvoDiff generates sequences without considering binding interfaces. For binder design, use RFdiffusion in binder mode or BindCraft. You could use EvoDiff to generate diverse scaffolds, then optimize for binding with structure-aware methods.
Start with 10 for initial exploration. For comprehensive sampling or experimental screening, generate 30-50 sequences. More sequences increase diversity but require more computational time and downstream filtering.
Yes, EvoDiff is open-source from Microsoft Research. On ProteinIQ, jobs cost 150 credits with usage-based adjustments. Guest and free users can run smaller jobs; premium tiers support larger generation runs.
EvoDiff is released under the MIT license, permitting commercial use. Check the GitHub repository for current licensing terms.
Alamdari, S., Thakur, N., van den Berg, R., Lu, A.X., Fusi, N., Amini, A.P., & Yang, K.K. (2023). Protein generation with evolutionary diffusion: sequence is all you need. bioRxiv. 10.1101/2023.09.11.556673