
Predict protein stability using validated BioPython methods: Instability Index, Aliphatic Index, GRAVY, flexibility analysis, and charge distribution

Predict protein thermostability changes (ΔΔG) for point mutations using a graph neural network. Enables computational saturation mutagenesis screening to identify stabilizing mutations.

Faithful static-mode Aggrescan3D tool for per-residue aggregation propensity analysis from a single protein structure.

Predict metal and water binding sites in protein structures using 3D convolutional neural networks (AllMetal3D + Water3D).

Predict pKa values of ionizable groups in proteins and protein-ligand complexes from 3D structure. PROPKA calculates environment-driven pKa shifts for standard ionizable residues, terminal groups, and supported ligand atom types.

Predict protein hydration sites from a structure using a diffusion model with ESM features and a confidence-filtering head.

Restore missing residues in antibody sequences using a language model trained on the Observed Antibody Space (OAS) database. Achieves better restoration than IMGT germlines or ESM-1b while being 7x faster.

Isoelectric Point Calculator 2.0 - Predict protein/peptide isoelectric point (pI) using 18+ validated pKa scales, SVR models, and deep learning. Supports proteins, peptides, and comprehensive analysis.

Find all Open Reading Frames (ORFs) in DNA sequences. Searches all six reading frames and supports multiple genetic codes.

Compute 200+ RDKit molecular descriptors, drug-likeness rule violations, and structural fingerprints for QSAR, virtual screening, and ML workflows
Protein-Sol predicts protein solubility directly from amino acid sequence. It reports the same core quantities described by the University of Manchester method: percent-sol, scaled-sol, population-sol, and predicted pI.
The method calculates sequence features such as charge balance, amino acid composition, Kyte-Doolittle hydropathy, fold propensity, disorder propensity, entropy, and beta propensity. These features are compared with the Niwa cell-free expression solubility dataset to estimate solubility.
Protein-Sol accepts protein sequences in FASTA format. Each sequence must include a header line beginning with > and at least 21 standard amino acid residues, because the method calculates sliding 21-residue profiles.
Use the 20 standard one-letter amino acid codes when possible:
>my_protein
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQLRProtein-Sol ignores stop symbols and characters outside the standard amino acid alphabet during its sequence preparation step. For clean, reproducible predictions, submit canonical protein FASTA records.
Protein-Sol returns a single file list of CSV outputs. The main predictions CSV contains one row per accepted sequence:
percent-sol: predicted solubility percentage.scaled-sol: prediction scaled over the source reference range.population-sol: reference population solubility on the same scale.pI: predicted isoelectric point used by the Protein-Sol feature model.The file list also includes CSV downloads for feature weights, feature deviations, sliding-window profiles, and whole-sequence/windowed composition data.
Protein-Sol returns CSV tables for the parsed results:
predictions.csv: solubility predictions and predicted pI.feature_weights.csv: feature contribution weights used by the model.feature_deviations.csv: sequence feature deviations.profiles.csv: 21-residue window profile values.composition.csv: whole-sequence and windowed composition features.composition_summaries.csv: composition summary lines.Generated run files are retained with the result download bundle:
seq_prediction.txt: prediction rows, feature weights, deviations, and 21-residue profiles.seq_composition.txt: whole-sequence, 21-residue, and 51-residue composition features.run.log: execution log.messages.txt: parser messages when Protein-Sol reports skipped or adjusted input records.Hebditch M, Carballo-Amador MA, Charonis S, Curtis R, Warwicker J. Protein-Sol: a web tool for predicting protein solubility from sequence. Bioinformatics. 2017;33(19):3098-3100. doi:10.1093/bioinformatics/btx345