Analysis

QEPPI analysis

Open tool

What is QEPPI?

QEPPI (Quantitative Estimate Index for Compounds Targeting Protein-Protein Interactions) is a computational method for assessing the drug-likeness of small molecules specifically designed to target protein-protein interactions. Developed by Kosugi and Ohue in 2021, QEPPI addresses limitations of traditional drug-likeness metrics like QED (Quantitative Estimate of Drug-likeness) when applied to PPI-targeting compounds.

The index emerged from the recognition that conventional drug discovery guidelines, including Lipinski's rule of five, are poorly suited for evaluating compounds targeting protein-protein interactions. PPI interfaces typically require larger, more lipophilic molecules than traditional enzyme inhibitors, necessitating specialized assessment criteria.

QEPPI extends the QED methodology by modeling physicochemical properties based on experimentally validated PPI modulators from the iPPI-DB database rather than FDA-approved oral drugs. This approach captures the distinct chemical space occupied by PPI-targeting compounds during early-stage drug discovery.

Physicochemical parameters

QEPPI evaluates seven molecular properties derived from SMILES notation using RDKit calculations:

  • Molecular weight (MW): Total molecular mass in Daltons, typically 400-600 Da for effective PPI inhibitors
  • LogP (ALogP): Wildman-Crippen lipophilicity estimate, with optimal values around 4.8 reflecting enhanced membrane permeability requirements
  • Hydrogen bond donors (HBD): Number of N-H and O-H groups available for hydrogen bonding
  • Hydrogen bond acceptors (HBA): Count of nitrogen and oxygen atoms capable of accepting hydrogen bonds
  • Topological polar surface area (TPSA): Molecular surface area occupied by polar atoms and their attached hydrogen atoms
  • Rotatable bonds (ROTB): Number of non-ring single bonds allowing free rotation, indicating molecular flexibility
  • Aromatic rings (AROM): Count of aromatic ring systems contributing to π-π stacking interactions

Each property follows an asymmetric double sigmoid distribution fitted to experimental PPI modulator data, with peak values representing optimal characteristics for PPI targeting.

Methodology

QEPPI calculation follows a multi-step process analogous to QED but optimized for PPI-targeting compounds:

Dataset modeling: The algorithm uses 1,007 non-redundant compounds from iPPI-DB, selected through Bemis-Murcko scaffold clustering to ensure chemical diversity. This dataset comprises compounds with experimental PPI inhibition or stabilization activity rather than marketed drugs.

Desirability functions: Each physicochemical property generates a desirability score through asymmetric double sigmoid fitting:

Q(x)=a+b1+exp(xc+d2e)[1b1+exp(xcd2f)]Q(x) = a + \frac{b}{1 + \exp\left(-\frac{x - c + \frac{d}{2}}{e}\right)}\left[1 - \frac{b}{1 + \exp\left(-\frac{x - c - \frac{d}{2}}{f}\right)}\right]

Weighted geometric mean: The final QEPPI score combines individual desirability functions through weighted geometric averaging:

QEPPIk=exp(iwiln(Q~i)iwi)QEPPI_k = \exp\left(\frac{\sum_i w_i \ln(\tilde{Q}_i)}{\sum_i w_i}\right)

where weights wiw_i are optimized to maximize Shannon entropy across the training dataset.

Comparison with existing rules

QEPPI demonstrates superior performance compared to traditional drug-likeness assessments when evaluated on PPI-targeting compounds:

Versus QED: While QED scores for PPI compounds show lower distributions compared to conventional drugs (AUC = 0.362), QEPPI exhibits higher scores for PPI modulators (AUC = 0.789), correctly identifying their distinct physicochemical requirements.

Versus rule-of-four: The rule-of-four (RO4) proposed by Morelli et al. establishes discrete criteria (MW > 400, LogP > 4, HBA > 4, rings > 4) based on 39 PPI inhibitors. QEPPI extends this concept into continuous scoring, achieving superior discrimination (F-score: QEPPI 0.501, RO4 0.451).

Property differences: QEPPI compounds show notably higher lipophilicity peaks (ALogP 4.78) compared to QED (ALogP 2.70), reflecting the hydrophobic nature of PPI binding interfaces. Similarly, molecular weight distributions favor larger molecules (peak ~500 Da) suitable for disrupting extended protein contact surfaces.

Applications in drug discovery

QEPPI serves multiple roles in pharmaceutical research:

Virtual screening: Filters large compound libraries to identify molecules with favorable PPI-targeting properties, reducing experimental screening costs while enriching hit rates for PPI modulators.

Lead optimization: Guides medicinal chemistry efforts during structure-activity relationship studies by quantifying changes in PPI drug-likeness as compounds undergo structural modifications.

Generative modeling: Functions as a reward signal in reinforcement learning approaches for molecular design, enabling automated generation of PPI-focused chemical libraries through methods like REINVENT and conditional VAEs.

Target-specific analysis: Shows differential performance across PPI families, with higher scores for compounds targeting primary epitopes (Bromodomain/Histone, XIAP/Smac) versus secondary epitopes (Bcl2/Bax, p53/MDM2), reflecting varying interface complexities.

Clinical validation

Analysis of PPI modulators in clinical development supports QEPPI's predictive validity:

Marketed drugs: PPI-targeting drugs approved since 1990 show increasing QEPPI scores over time, suggesting evolution toward compounds better aligned with PPI chemical space requirements.

Clinical candidates: Contemporary PPI modulators in clinical trials demonstrate median QEPPI scores (~0.59) higher than historical marketed compounds, indicating improved compound selection through better understanding of PPI drug-likeness.

Target diversity: QEPPI effectively evaluates diverse PPI targets including chemokine receptors (average score 0.641), MDM2/p53 interactions (0.593), and immune checkpoints, demonstrating broad applicability across therapeutic areas.

Limitations

Several constraints affect QEPPI applicability:

Structural alerts: The model excludes structural alert parameters used in QED due to dataset characteristics, requiring separate PAINS and reactive group filtering during practical screening applications.

Macrocyclic compounds: Natural product-derived PPI inhibitors and large macrocycles may receive lower scores due to underrepresentation in the training dataset, particularly compounds exceeding 800 Da molecular weight.

Target specificity: Interface complexity varies significantly across PPI families, with smaller interfaces (LEDGF/IN, transthyretin dimers) favoring lower QEPPI scores, suggesting potential for target-specific model refinements.

Three-dimensional properties: Current implementation focuses on 2D molecular descriptors, omitting conformational and stereochemical factors important for PPI binding, representing an area for future development.

Computational implementation

QEPPI calculation utilizes standard cheminformatics tools:

Input processing: SMILES strings undergo RDKit parsing for property calculation using established molecular descriptor algorithms including Wildman-Crippen LogP estimation and Lipinski descriptor computation.

Score calculation: The asymmetric double sigmoid model parameters enable direct property-to-desirability conversion, followed by weighted geometric mean calculation to generate final QEPPI scores ranging from 0 to 1.

Threshold optimization: Default threshold values (0.5196 for maximum F-score) provide binary classification capability while preserving continuous scoring advantages for ranking and optimization applications.

Impact and future directions

QEPPI represents a paradigm shift toward target-class-specific drug-likeness assessment, moving beyond one-size-fits-all approaches toward customized evaluation frameworks. The methodology's success has inspired development of additional specialized indices for other challenging target classes.

Future enhancements may incorporate three-dimensional molecular properties, expand training datasets to cover underrepresented compound classes, and develop target-specific variants tailored to individual PPI families' distinct physicochemical requirements.

The approach demonstrates broader applicability to emerging therapeutic areas, including host-pathogen PPIs relevant to antiviral drug development, with preliminary analysis of SARS-CoV-2 S protein inhibitors showing median QEPPI scores of 0.511.

Cost

QEPPI analysis with ProteinIQ costs 1 credit per molecule, providing comprehensive evaluation of all seven physicochemical properties and final drug-likeness scoring. This cost-effective approach enables large-scale virtual screening and systematic assessment of compound libraries for PPI-targeting potential.


Based on: Kosugi, T., & Ohue, M. (2021). Quantitative estimate index for early-stage screening of compounds targeting protein-protein interactions. International Journal of Molecular Sciences, 22(20), 10925. DOI: 10.3390/ijms222010925

QEPPI analysis - ProteinIQ docs - ProteinIQ Documentation