DR-BERT

AI-powered per-residue disorder prediction for protein sequences

Input

Job name

Protein sequences

10 credits

Output

Configure input settings, then click "Run"

What is DR-BERT?

Intrinsically disordered regions (IDRs) are protein segments that lack a stable three-dimensional structure under physiological conditions. Far from being non-functional, IDRs are disproportionately involved in signaling, transcriptional regulation, and molecular recognition — they enable proteins to interact with multiple partners through conformational flexibility. Predicting which residues are disordered is essential for interpreting function, designing experiments, and understanding diseases linked to aberrant phase separation.

DR-BERT is a compact protein language model developed at the University of Illinois Urbana-Champaign that predicts intrinsically disordered regions directly from amino acid sequence. Trained without any evolutionary or biophysical input, DR-BERT assigns each residue a disorder probability between 0 and 1, providing position-level resolution across the full sequence.

How DR-BERT works

DR-BERT applies the BERT (Bidirectional Encoder Representations from Transformers) framework to proteins. The model architecture is a transformer encoder with six stacked layers, each computing attention over all residues simultaneously. This bidirectional design means the disorder prediction at any given residue reflects the full sequence context — not just local amino acid composition.

Training proceeds in two stages. First, DR-BERT is pretrained on approximately six million unannotated protein sequences using masked language modeling: random residues are masked and the model learns to predict them from context. This stage builds general representations of sequence grammar and co-evolutionary patterns without any labeled data. Second, the pretrained model is fine-tuned on curated disorder annotations, adapting the contextual representations to the disorder prediction task.

The compact size distinguishes DR-BERT from larger protein language models. By using six transformer layers rather than tens, inference is fast enough for practical use on CPUs and modest hardware — without sacrificing meaningful accuracy.

Benchmarks on the Critical Assessment of protein Intrinsic Disorder (CAID) dataset show DR-BERT achieves statistically significant improvements over several established methods, and it is competitive across multiple CAID 2 test cases.

How to use DR-BERT online

ProteinIQ runs DR-BERT on cloud infrastructure, returning per-residue disorder scores without local installation or dependency management.

Input

Input	Description
`Protein sequence`	FASTA or raw amino acid sequence. Accepts 1–5 sequences per job. Maximum 1022 residues per sequence. Sequences can be fetched from RCSB by PDB ID.

Sequences longer than 1022 residues must be split before submission. The 1022-residue limit reflects the BERT positional encoding window.

Output

Results are returned as a table with one row per residue.

Column	Description
`Residue`	Amino acid at that position (single-letter code)
`Position`	Sequence index (1-based)
`Disorder score`	Predicted disorder probability (0–1). Higher values indicate greater likelihood of disorder.

Interpreting scores

Score range	Interpretation
≥ 0.5	Disordered — residue is predicted to lack stable structure
< 0.5	Ordered — residue is predicted to adopt a stable conformation
0.4–0.6	Boundary region; interpret with other evidence

Continuous stretches of high-scoring residues define disordered regions. Isolated high-scoring residues within otherwise ordered sequences may represent flexible loops rather than true IDRs.

Applications

DR-BERT scores have practical uses across structural and functional biology:

Guiding crystallography: Disordered termini and loops are common causes of crystallization failure. Trimming high-scoring regions improves construct design.
Identifying linear motifs: Many short linear motifs (SLiMs) embedded in IDRs mediate protein–protein interactions. Disorder predictions help locate candidate regions.
Phase separation research: Low-complexity disordered regions drive liquid–liquid phase separation. DR-BERT scores flag regions worth probing experimentally.
AlphaFold annotation: Regions with low AlphaFold pLDDT scores often correspond to disordered segments; DR-BERT provides an independent sequence-based confirmation.
Functional annotation of uncharacterized proteins: When structural data are unavailable, per-residue disorder scores provide an early hypothesis about which regions may be functionally flexible.

Limitations

Maximum sequence length of 1022 residues is fixed by the BERT positional encoding. Multi-domain proteins must be analyzed in segments.
DR-BERT was trained on DisProt and similar disorder databases, which skew toward certain organism types and protein classes. Predictions on highly unusual sequences may be less reliable.
Like all sequence-only predictors, DR-BERT cannot capture disorder that is conditionally induced by partners or post-translational modifications.
Batch size is capped at five sequences per job.

ESMFold: Structure prediction — cross-referencing pLDDT confidence with DR-BERT scores provides orthogonal disorder evidence
ESM-2: Protein language model embeddings for downstream analyses
Chou-Fasman: Classical secondary structure prediction from sequence propensities
DSSP: Secondary structure and solvent accessibility assignment from an existing PDB structure
Instability Index: Sequence-based estimate of in vivo protein stability

DR-BERT

Input

Output

What is DR-BERT?

How DR-BERT works

How to use DR-BERT online

Input

Output

Interpreting scores

Applications

Limitations

Related tools

Input

Output

What is DR-BERT?

How DR-BERT works

How to use DR-BERT online

Input

Output

Interpreting scores

Applications

Limitations

Related tools