What is HighFold?
HighFold predicts the three-dimensional structures of cyclic peptides — molecules where the chain loops back on itself through a head-to-tail peptide bond, sometimes reinforced by disulfide bridges. Standard protein structure prediction methods like AlphaFold 2 treat sequences as linear chains and encode residue positions accordingly, which means they systematically misrepresent the topology of cyclic peptides.
HighFold solves this by replacing AlphaFold's linear position encoding with CycPOEM (Cyclic Position Offset Encoding Matrix), a distance matrix that accounts for head-to-tail cyclization and disulfide bond shortcuts. The rest of the AlphaFold2/ColabFold pipeline — MSA generation, Evoformer attention, and structure module — remains unchanged. The result is substantially more accurate cyclic peptide structures without retraining any neural network weights.
The method was developed at Zhejiang University of Technology and Shanghai Highslab Therapeutics, published in Briefings in Bioinformatics (2024).
How CycPOEM works
AlphaFold encodes the relative position between two residues as a simple linear offset: residue and residue are apart. For a cyclic peptide of length , this ignores that residues 1 and are covalently bonded.
CycPOEM constructs a more accurate distance matrix using a modified Floyd-Warshall shortest-path algorithm:
- An adjacency matrix is initialized with distance 1 between consecutive residues
- A head-to-tail edge connects residue 1 to residue (distance 1)
- If disulfide bonds are specified, each pair of bridged cysteine residues also receives distance 1
- The Floyd-Warshall algorithm then computes the shortest path between every pair of residues through this graph
This means two residues on opposite sides of a 20-residue cyclic peptide are encoded as being 10 apart rather than 19 — reflecting the actual molecular topology.
Sign strategies
CycPOEM also encodes directionality. The Upper Negative (UN) strategy, which assigns negative values to the upper triangle of the matrix, produced the best results in benchmarks. This captures the asymmetric nature of peptide bonds (N→C directionality).
Benchmark results
On a test set of 63 NMR-resolved cyclic peptide structures, HighFold achieved a median backbone RMSD of 1.058 Å, compared to 1.737 Å for AfCycDesign and 1.956 Å for standard AlphaFold. For cyclic peptides with disulfide bridges, the improvement was even more pronounced: 1.720 Å average RMSD versus 3.256 Å for AfCycDesign.
How to use HighFold online
ProteinIQ runs HighFold on A100 GPU infrastructure with pre-loaded AlphaFold2 model weights, eliminating the need to configure ColabFold, install HighFold's overlay, or manage GPU environments locally.
Input
| Input | Description |
|---|---|
Cyclic peptide | Amino acid sequence in FASTA or raw format. Single chain only. Standard amino acids (20 canonical residues). |
Cyclization settings
| Setting | Description |
|---|---|
Disulfide bond pairs | Residue pairs forming disulfide bridges, e.g. 1 5, 3 8. Each pair is two space-separated residue numbers; multiple pairs are comma-separated. Leave empty for head-to-tail cyclization only. |
Prediction settings
| Setting | Default | Description |
|---|---|---|
Number of models | 5 | AlphaFold2 models to run (1–5). All five gives the most reliable ranking but increases runtime proportionally. |
Random seeds | 1 | Number of random seeds per model (1–5). More seeds produce greater diversity in predicted conformations. |
Model type | AlphaFold2 | Model variant. AlphaFold2 or AlphaFold2-ptm. Both are monomer models suitable for cyclic peptides. |
MSA settings
| Setting | Default | Description |
|---|---|---|
MSA mode | UniRef+Environmental | Database for multiple sequence alignment generation via MMseqs2. UniRef+Environmental searches UniRef and environmental sequence databases. UniRef only is faster. No MSA runs single-sequence mode (no evolutionary information). |
Use templates | On | Query the PDB for structural templates. Useful when homologous structures exist. |
Advanced settings
| Setting | Default | Description |
|---|---|---|
AMBER relaxation | On | Run OpenMM/AMBER energy minimization on predicted structures. Recommended for cyclic peptides, where strained geometry is common. |
Structures to relax | 1 | Number of top-ranked predictions to relax (0–5). Only applies when AMBER relaxation is enabled. |
Output
Each prediction run produces up to 5 ranked structures (depending on model count and seeds). Results include:
- PDB files: 3D coordinates for each ranked prediction, viewable in the built-in 3D viewer
- pLDDT scores: Per-residue confidence (0–100), averaged across the structure for ranking
- pTM scores: Predicted TM-score (when available), used as the primary ranking metric
- MSA alignment: The multiple sequence alignment used as input (
.a3mformat) - ColabFold plots: Coverage and pLDDT visualization plots (
.png)
Interpreting confidence scores
| pLDDT | Interpretation |
|---|---|
| > 90 | High confidence — backbone and sidechain positions likely accurate |
| 70–90 | Good confidence — backbone probably correct, sidechains less certain |
| 50–70 | Low confidence — treat with caution, may need experimental validation |
| < 50 | Very low confidence — structure unreliable in this region |
For cyclic peptides, pLDDT values tend to be lower than for globular proteins of similar size, particularly in loop regions. A pLDDT of 70+ for a short cyclic peptide generally indicates a successful prediction.
Limitations
- Natural amino acids only: HighFold supports the 20 canonical amino acids. Cyclic peptides containing non-natural amino acids, N-methylated residues, or D-amino acids cannot be modeled. (The successor HighFold3 addresses this limitation but is not yet available here.)
- Single chain: Only monomeric cyclic peptides are supported. Cyclic peptide–protein complexes require HighFold_Multimer, which uses a different model architecture.
- Sequence length: Very short peptides (< 5 residues) may not generate meaningful MSAs, reducing prediction quality. Consider using
No MSAmode for very short sequences. - AMBER relaxation failures: Energy minimization can fail on highly strained or unusual cyclic topologies. When this happens, the unrelaxed structure is returned instead.
Related tools
- AlphaFold 2: General-purpose protein structure prediction — the foundation HighFold builds on
- Boltz-2: Biomolecular structure prediction including protein-ligand complexes
- ESMFold: Fast single-sequence structure prediction (no MSA), useful for quick screening before HighFold
- Chai-1: Multi-modal biomolecular structure prediction
