Protein-Sol

(2017-10)

Predict protein solubility from sequence with feature-level and windowed profile outputs.

Input

Protein sequences

0/50,000

1 credit

Output

Configure inputs to begin

Set options on the left, then click “Predict solubility” — or start from an example.

Lysozyme-like sequence

What is Protein-Sol?

Protein-Sol predicts protein solubility directly from amino acid sequence. It reports the same core quantities described by the University of Manchester method: percent-sol, scaled-sol, population-sol, and predicted pI.

The method calculates sequence features such as charge balance, amino acid composition, Kyte-Doolittle hydropathy, fold propensity, disorder propensity, entropy, and beta propensity. These features are compared with the Niwa cell-free expression solubility dataset to estimate solubility.

Input Requirements

Protein-Sol accepts protein sequences in FASTA format. Each sequence must include a header line beginning with > and at least 21 standard amino acid residues, because the method calculates sliding 21-residue profiles.

Use the 20 standard one-letter amino acid codes when possible:

Text

>my_protein
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQLR

Protein-Sol ignores stop symbols and characters outside the standard amino acid alphabet during its sequence preparation step. For clean, reproducible predictions, submit canonical protein FASTA records.

Results

Protein-Sol returns a single file list of CSV outputs. The main predictions CSV contains one row per accepted sequence:

percent-sol: predicted solubility percentage.
scaled-sol: prediction scaled over the source reference range.
population-sol: reference population solubility on the same scale.
pI: predicted isoelectric point used by the Protein-Sol feature model.

The file list also includes CSV downloads for feature weights, feature deviations, sliding-window profiles, and whole-sequence/windowed composition data.

Downloadable Files

Protein-Sol returns CSV tables for the parsed results:

predictions.csv: solubility predictions and predicted pI.
feature_weights.csv: feature contribution weights used by the model.
feature_deviations.csv: sequence feature deviations.
profiles.csv: 21-residue window profile values.
composition.csv: whole-sequence and windowed composition features.
composition_summaries.csv: composition summary lines.

Generated run files are retained with the result download bundle:

seq_prediction.txt: prediction rows, feature weights, deviations, and 21-residue profiles.
seq_composition.txt: whole-sequence, 21-residue, and 51-residue composition features.
run.log: execution log.
messages.txt: parser messages when Protein-Sol reports skipped or adjusted input records.

Citation

Hebditch M, Carballo-Amador MA, Charonis S, Curtis R, Warwicker J. Protein-Sol: a web tool for predicting protein solubility from sequence. Bioinformatics. 2017;33(19):3098-3100. doi:10.1093/bioinformatics/btx345

Related tools

CANYA

Predict protein aggregation nucleation propensity from amino acid sequences using the Lehner Lab CANYA neural network.

Prot2Prop

Predict multiple protein developability properties from amino-acid sequences using a multitask ProstT5 adapter.

Protein stability

Predict protein stability using validated BioPython methods: Instability Index, Aliphatic Index, GRAVY, flexibility analysis, and charge distribution

ThermoMPNN

Predict protein thermostability changes (ΔΔG) for point mutations using a graph neural network. Enables computational saturation mutagenesis screening to identify stabilizing mutations.

Aggrescan3D

Faithful static-mode Aggrescan3D tool for per-residue aggregation propensity analysis from a single protein structure.

AllMetal3D

Predict metal and water binding sites in protein structures using 3D convolutional neural networks (AllMetal3D + Water3D).

PROPKA 3

Predict pKa values of ionizable groups in proteins and protein-ligand complexes from 3D structure. PROPKA calculates environment-driven pKa shifts for standard ionizable residues, terminal groups, and supported ligand atom types.

SuperWater

Predict protein hydration sites from a structure using a diffusion model with ESM features and a confidence-filtering head.

AbLang

Restore missing residues in antibody sequences using a language model trained on the Observed Antibody Space (OAS) database. Achieves better restoration than IMGT germlines or ESM-1b while being 7x faster.

IPC 2.0 (isoelectric point calculator)

Isoelectric Point Calculator 2.0 - Predict protein/peptide isoelectric point (pI) using 18+ validated pKa scales, SVR models, and deep learning. Supports proteins, peptides, and comprehensive analysis.