ProteinIQ
IPC 2.0 (isoelectric point calculator) icon

IPC 2.0 (isoelectric point calculator)

Predict isoelectric point (pI) of proteins and peptides using validated pKa scales and machine learning

What is IPC 2.0?

IPC 2.0 (Isoelectric Point Calculator 2.0) is a computational tool for predicting the isoelectric point (pI) of proteins and peptides. The isoelectric point is the pH at which a molecule carries no net electrical charge. Developed by Lukasz Kozlowski at the University of Warsaw, IPC 2.0 combines classical Henderson-Hasselbalch calculations with machine learning methods to achieve state-of-the-art prediction accuracy.

The original maximum IPC 1.0 (published in 2016) introduced optimized pKa scales for Henderson-Hasselbalch calculations. IPC 2.0 (published in 2021) extended this with support vector regression (SVR) and deep learning models trained on experimentally determined isoelectric points from 2D gel electrophoresis and high-resolution isoelectric focusing experiments.

Applications

  • Protein purification: Designing isoelectric focusing and ion exchange chromatography protocols
  • 2D gel electrophoresis: Predicting protein migration patterns for proteomics experiments
  • Peptide separation: Optimizing HPLC conditions based on peptide charge state
  • Buffer selection: Choosing appropriate pH conditions to maintain protein solubility
  • Antibody development: Assessing charge heterogeneity and formulation stability

How to use IPC 2.0

IPC 2.0 accepts protein or peptide sequences as input and returns isoelectric point predictions using the selected method. The interface supports batch processing of multiple sequences and direct import from UniProt.

Inputs

InputDescription
Protein/Peptide SequencesOne or more amino acid sequences in FASTA format. Sequences can also be uploaded as .fasta, .fa, .fas, .txt, or .csv files. UniProt IDs can be fetched directly using the batch fetcher.

Settings

SettingDescription
PredictorThe prediction method to use. IPC1_ALL (default) calculates pI using all 18 Henderson-Hasselbalch pKa scales. IPC2_SVR_protein and IPC2_SVR_peptide use support vector regression optimized for proteins or peptides respectively. IPC2_DL_peptide uses a separable convolutional neural network (best accuracy for peptides). ALL runs every method for comprehensive comparison.

Predictor options

PredictorBest forDescription
IPC1_ALLGeneral useHenderson-Hasselbalch equation with 18 validated pKa scales. Fastest option.
IPC2_SVR_proteinProteinsSupport vector regression trained on 2,324 proteins from SWISS-2DPAGE and PIP-DB databases.
IPC2_SVR_peptidePeptidesSupport vector regression trained on 119,092 peptides from HiRIEF experiments.
IPC2_DL_peptidePeptides ≤60 aaSepConv2D deep learning model with four-channel input. Highest accuracy for short peptides.
ALLMethod comparisonRuns all predictors and pKa scales for comprehensive analysis.

Results

Output is a spreadsheet with isoelectric point predictions for each sequence.

ColumnDescription
IDSequence identifier from the FASTA header.
LengthNumber of amino acid residues.
MWMolecular weight in Daltons.
pI_[scale]Predicted isoelectric point using the specified pKa scale (e.g., pI_IPC2_protein, pI_Bjellqvist).
pI_SVR_proteinSVR prediction optimized for proteins (when SVR method selected).
pI_SVR_peptideSVR prediction optimized for peptides (when SVR method selected).
pI_DL_peptideDeep learning prediction (when DL method selected).
Avg_pIAverage pI across all calculated scales.

Interpreting pI values

Isoelectric point values typically range from 3 to 12, with most proteins falling between 4 and 10:

  • pI < 7: Acidic protein (negatively charged at physiological pH)
  • pI ≈ 7: Neutral protein at physiological pH
  • pI > 7: Basic protein (positively charged at physiological pH)

Different pKa scales may produce slightly different predictions (typically within 0.5 pH units). For critical applications, the SVR or deep learning predictors trained on experimental data provide better accuracy than Henderson-Hasselbalch methods alone.

How does IPC 2.0 work?

IPC 2.0 offers three distinct prediction approaches: classical Henderson-Hasselbalch calculations, support vector regression, and deep learning. Each method has different accuracy characteristics depending on sequence type.

Henderson-Hasselbalch method (IPC 1.0)

The classical approach calculates net protein charge as a function of pH using the Henderson-Hasselbalch equation:

pH=pKa+log[A][HA]\text{pH} = \text{p}K_a + \log\frac{[\text{A}^-]}{[\text{HA}]}

The isoelectric point is found by iterative bisection, identifying the pH where the sum of positive charges (from lysine, arginine, histidine, and the N-terminus) equals the sum of negative charges (from aspartate, glutamate, cysteine, tyrosine, and the C-terminus).

pKa scales

IPC 2.0 includes 18 validated pKa scales from the literature:

ScaleSource
IPC2_proteinOptimized for proteins (IPC 2.0)
IPC2_peptideOptimized for peptides (IPC 2.0)
IPC_proteinOriginal IPC scale for proteins
IPC_peptideOriginal IPC scale for peptides
Bjellqvist2D electrophoresis standard
EMBOSSEMBOSS software suite
DTASelectProteomics analysis
GrimsleyExperimental measurements
LehningerBiochemistry textbook values
SolomonProtein chemistry
SilleroCalculated values
RodwellBiochemistry reference
ThurlkillNMR measurements
ToselandStatistical analysis
NozakiExperimental values
DawsonData compilation
WikipediaGeneral reference
PatrickiosSimplified (charged residues only)

The IPC2_protein and IPC2_peptide scales were optimized by training on experimental pI data and selecting pKa values that minimize prediction error.

Support vector regression (SVR)

The SVR models use an ensemble averaging approach where predictions from all 18 Henderson-Hasselbalch methods plus the ProMoST algorithm serve as input features. This 19-dimensional feature vector feeds into a support vector machine with radial basis function (RBF) kernel.

Training data comprised:

  • Proteins: 2,324 entries from SWISS-2DPAGE and PIP-DB (average length 387 amino acids)
  • Peptides: 119,092 entries from HiRIEF experiments (average length 14.6 amino acids)

The SVR approach achieves RMSD of 0.848 for proteins and 0.230 for peptides, outperforming any single pKa scale.

SepConv2D deep learning

The deep learning model uses a separable convolutional neural network architecture with four input channels:

  1. One-hot encoded sequence: 22 × 60 matrix representing amino acid identity
  2. AAindex features: 15 physicochemical property indices plus hydrophobicity
  3. Amino acid counts: Composition vector repeated across sequence length
  4. IPC 1.0 predictions: pI values from all scales plus SVR prediction

Sequences longer than 60 amino acids undergo smart truncation that preferentially removes non-charged residues (alanine, glycine, leucine, etc.) while preserving ionizable residues that determine pI.

The SepConv2D model achieves RMSD of 0.222 for peptides, representing the highest accuracy available for short sequences.

Limitations

  • Sequence length for deep learning: The SepConv2D model was trained on peptides up to 60 amino acids. Longer sequences are truncated, which may reduce accuracy for full-length proteins.
  • Post-translational modifications: Predictions assume unmodified sequences. Phosphorylation, glycosylation, and other modifications alter the actual isoelectric point.
  • Protein folding: All methods treat sequences as unfolded polypeptides. Buried ionizable residues in folded proteins may have shifted pKa values.
  • Non-standard amino acids: Selenocysteine (U) is converted to cysteine. Other non-standard residues may not be handled correctly.
  • pI Calculator: Simpler client-side pI calculator using the Bjellqvist scale for quick estimates
  • Protein Parameters: Comprehensive physicochemical property analysis including pI, molecular weight, and extinction coefficient
  • Molecular Weight: Calculate protein molecular weight from sequence
  • Extinction Coefficient: Calculate molar extinction coefficient for concentration determination
  • GRAVY: Grand average of hydropathy for solubility prediction
  • Amino Acid Composition: Analyze residue distribution that determines pI