Related tools

SuperWater
Predict protein hydration sites from a structure using a diffusion model with ESM features and a confidence-filtering head.

Aggrescan3D
Faithful static-mode Aggrescan3D wrapper for per-residue aggregation propensity analysis from a single protein structure.

AllMetal3D
Predict metal and water binding sites in protein structures using 3D convolutional neural networks (AllMetal3D + Water3D).

PROPKA 3
Predict pKa values of ionizable groups in proteins and protein-ligand complexes from 3D structure. PROPKA calculates environment-driven pKa shifts for standard ionizable residues, terminal groups, and supported ligand atom types.

CleaveNet
Official CleaveNet wrapper for matrix metalloproteinase cleavage prediction and peptide generation. Predict cleavage z-scores plus uncertainty across 17 MMP variants, evaluate against truth z-scores, or generate candidate peptides unconditionally or from MMP z-score profiles.

AF-Cluster
Cluster Multiple Sequence Alignments to predict alternative protein conformations with AlphaFold2. Uses DBSCAN clustering to identify sequence subgroups.

AlphaGenome
AlphaGenome predicts variant effects on gene expression by comparing reference and alternate alleles. Analyze how genetic variants impact regulatory function across up to 1M base pair regions. Uses your own DeepMind API key - no credit cost.

Protein stability
Predict protein stability using validated BioPython methods: Instability Index, Aliphatic Index, GRAVY, flexibility analysis, and charge distribution

RAxML-NG
Perform maximum-likelihood phylogenetic tree inference with RAxML-NG for aligned protein or DNA sequences. Supports ML search, bootstrap analysis, and upstream-style automatic model-family selection.

ThermoMPNN
Predict protein thermostability changes (ΔΔG) for point mutations using a graph neural network. Enables computational saturation mutagenesis screening to identify stabilizing mutations.
What is DR-BERT?
Intrinsically disordered regions (IDRs) are protein segments that lack a stable three-dimensional structure under physiological conditions. Far from being non-functional, IDRs are disproportionately involved in signaling, transcriptional regulation, and molecular recognition — they enable proteins to interact with multiple partners through conformational flexibility. Predicting which residues are disordered is essential for interpreting function, designing experiments, and understanding diseases linked to aberrant phase separation.
DR-BERT is a compact protein language model developed at the University of Illinois Urbana-Champaign that predicts intrinsically disordered regions directly from amino acid sequence. Trained without any evolutionary or biophysical input, DR-BERT assigns each residue a disorder probability between 0 and 1, providing position-level resolution across the full sequence.
How DR-BERT works
DR-BERT applies the BERT (Bidirectional Encoder Representations from Transformers) framework to proteins. The model architecture is a transformer encoder with six stacked layers, each computing attention over all residues simultaneously. This bidirectional design means the disorder prediction at any given residue reflects the full sequence context — not just local amino acid composition.
Training proceeds in two stages. First, DR-BERT is pretrained on approximately six million unannotated protein sequences using masked language modeling: random residues are masked and the model learns to predict them from context. This stage builds general representations of sequence grammar and co-evolutionary patterns without any labeled data. Second, the pretrained model is fine-tuned on curated disorder annotations, adapting the contextual representations to the disorder prediction task.
The compact size distinguishes DR-BERT from larger protein language models. By using six transformer layers rather than tens, inference is fast enough for practical use on CPUs and modest hardware — without sacrificing meaningful accuracy.
Benchmarks on the Critical Assessment of protein Intrinsic Disorder (CAID) dataset show DR-BERT achieves statistically significant improvements over several established methods, and it is competitive across multiple CAID 2 test cases.
How to use DR-BERT online
ProteinIQ runs DR-BERT on cloud infrastructure, returning per-residue disorder scores without local installation or dependency management.
Input
| Input | Description |
|---|---|
Protein sequence | FASTA or raw amino acid sequence. Accepts 1–5 sequences per job. Maximum 1022 residues per sequence. Sequences can be fetched from RCSB by PDB ID. |
Sequences longer than 1022 residues must be split before submission. The 1022-residue limit reflects the BERT positional encoding window.
Output
Results are returned as a table with one row per residue.
| Column | Description |
|---|---|
Residue | Amino acid at that position (single-letter code) |
Position | Sequence index (1-based) |
Disorder score | Predicted disorder probability (0–1). Higher values indicate greater likelihood of disorder. |
Interpreting scores
| Score range | Interpretation |
|---|---|
| ≥ 0.5 | Disordered — residue is predicted to lack stable structure |
| < 0.5 | Ordered — residue is predicted to adopt a stable conformation |
| 0.4–0.6 | Boundary region; interpret with other evidence |
Continuous stretches of high-scoring residues define disordered regions. Isolated high-scoring residues within otherwise ordered sequences may represent flexible loops rather than true IDRs.
Applications
DR-BERT scores have practical uses across structural and functional biology:
- Guiding crystallography: Disordered termini and loops are common causes of crystallization failure. Trimming high-scoring regions improves construct design.
- Identifying linear motifs: Many short linear motifs (SLiMs) embedded in IDRs mediate protein–protein interactions. Disorder predictions help locate candidate regions.
- Phase separation research: Low-complexity disordered regions drive liquid–liquid phase separation. DR-BERT scores flag regions worth probing experimentally.
- AlphaFold annotation: Regions with low AlphaFold pLDDT scores often correspond to disordered segments; DR-BERT provides an independent sequence-based confirmation.
- Functional annotation of uncharacterized proteins: When structural data are unavailable, per-residue disorder scores provide an early hypothesis about which regions may be functionally flexible.
Limitations
- Maximum sequence length of 1022 residues is fixed by the BERT positional encoding. Multi-domain proteins must be analyzed in segments.
- DR-BERT was trained on DisProt and similar disorder databases, which skew toward certain organism types and protein classes. Predictions on highly unusual sequences may be less reliable.
- Like all sequence-only predictors, DR-BERT cannot capture disorder that is conditionally induced by partners or post-translational modifications.
- Batch size is capped at five sequences per job.