ProteinIQ
Admetica icon

Admetica

Admetica is an open-source ADMET prediction toolkit from Datagrok that uses Chemprop-based machine learning models for accurate pharmacokinetic property predictions. Ideal for drug design and lead optimization.

What is Admetica?

Admetica is an open-source ADMET prediction toolkit developed by Datagrok. It uses Chemprop message-passing neural networks to predict 23 pharmacokinetic and toxicity properties from SMILES strings. Unlike rule-based filters that check molecular descriptors against thresholds, Admetica learns structure-property relationships directly from experimental data.

The toolkit addresses a practical problem in drug discovery: evaluating absorption, distribution, metabolism, excretion, and toxicity (ADMET) early enough to avoid late-stage failures. Compounds that look promising in target assays often fail because they cannot reach the target, get metabolized too quickly, or cause toxicity. Predicting these properties computationally allows researchers to prioritize compounds before committing to expensive synthesis and testing.

How does Admetica work?

Admetica builds on Chemprop, a graph neural network framework that represents molecules as directed graphs. Atoms become nodes; bonds become edges. The network passes information along bonds, learning to associate local structural features with global molecular properties.

Each ADMET property has its own model trained on curated datasets from ChEMBL, AstraZeneca, and academic sources. Training data ranges from 332 compounds (P-gp substrate) to nearly 10,000 compounds (solubility), with model performance validated against held-out test sets.

The regression models output continuous values (permeability coefficients, clearance rates, LD50), while classification models output probabilities for binary outcomes (CYP inhibitor yes/no, hERG liability yes/no).

How to use Admetica online

ProteinIQ hosts Admetica on cloud infrastructure, eliminating the need to install Python environments or configure GPU compute.

Input

FormatDescription
Plain SMILESOne compound per line
Tab-delimitedcompound_name\tSMILES format to preserve identifiers
SDF fileStructure-data file with multiple molecules
CSV fileComma-separated with SMILES column
PubChem fetchRetrieve structures by compound name or CID

Output columns

Results appear as a spreadsheet with one row per compound. Columns are organized by ADMET category.

Molecular properties

ColumnDescription
molecular_weightMolecular weight in Daltons
logpOctanol-water partition coefficient
tpsaTopological polar surface area (Ų)
hbdHydrogen bond donor count
hbaHydrogen bond acceptor count
qedQuantitative estimate of drug-likeness (0–1)

Absorption

ColumnUnit/TypeDescription
caco2_permeabilitylog cm/sPermeability across intestinal epithelial cells
lipophilicitylog ratioOctanol-water partition (model-predicted)
solubilitylog mol/LAqueous solubility
pgp_substrateprobabilityLikelihood of P-glycoprotein efflux

Distribution

ColumnUnit/TypeDescription
ppbrpercentagePlasma protein binding rate

Metabolism

CYP enzymes metabolize the majority of small-molecule drugs. Inhibiting them causes drug-drug interactions; being a substrate affects clearance.

ColumnTypeDescription
cyp1a2_inhibitorprobabilityCYP1A2 inhibition liability
cyp2c9_inhibitorprobabilityCYP2C9 inhibition liability
cyp2c19_inhibitorprobabilityCYP2C19 inhibition liability
cyp2d6_inhibitorprobabilityCYP2D6 inhibition liability
cyp3a4_inhibitorprobabilityCYP3A4 inhibition liability
cyp2c9_substrateprobabilityMetabolized by CYP2C9
cyp2d6_substrateprobabilityMetabolized by CYP2D6
cyp3a4_substrateprobabilityMetabolized by CYP3A4

Excretion

ColumnUnitDescription
clearance_hepatocytelog mL/min/gClearance rate in hepatocyte assays
clearance_microsomelog mL/min/gClearance rate in microsome assays

Toxicity

ColumnUnit/TypeDescription
hergprobabilityhERG channel inhibition (cardiac risk)
ld50log mg/kgAcute oral toxicity
amesprobabilityMutagenicity (Ames test)
diliprobabilityDrug-induced liver injury risk
skin_sensitizationprobabilitySkin sensitization potential
carcinogenicityprobabilityCarcinogenic potential
clinical_toxicityprobabilityGeneral clinical toxicity risk

Interpreting results

Absorption predictions

Caco-2 permeability values above −5 log cm/s suggest good intestinal absorption. Values below −6 indicate poor passive permeability, though active transport can compensate.

Solubility above −4 log mol/L is generally favorable for oral drugs. Lower values may require formulation strategies.

P-gp substrate probability above 0.5 suggests the compound may be pumped out of cells, reducing bioavailability and potentially causing drug-drug interactions with P-gp inhibitors.

Metabolism predictions

CYP inhibition probabilities above 0.7 warrant attention for drug-drug interaction potential. CYP3A4 inhibition is particularly significant because this enzyme metabolizes approximately 50% of marketed drugs.

High clearance predictions (hepatocyte or microsome) indicate rapid metabolism, which may limit exposure. Low clearance can cause accumulation.

Toxicity predictions

hERG inhibition probability above 0.5 flags potential cardiac liability. This channel controls heart rhythm; blocking it can cause fatal arrhythmias. Most drug development programs include hERG screening as a safety gate.

Ames positivity (probability > 0.5) indicates potential mutagenicity. Mutagenic compounds are typically excluded from development unless the therapeutic benefit justifies the risk (e.g., oncology).

Admetica vs ADMET-AI

Both tools use Chemprop neural networks for ADMET prediction. Key differences:

AspectAdmeticaADMET-AI
Properties23 models41 models
Training dataChEMBL, AstraZenecaTherapeutics Data Commons
DeveloperDatagrokStanford/Greenstone
Additional propertiesSkin sensitization, carcinogenicity, clinical toxicityBBB penetration, half-life, oral bioavailability

The tools complement each other. Running both provides broader coverage and a second opinion on shared endpoints.

Limitations

  • Chemical space: Models trained primarily on drug-like small molecules. Performance degrades for peptides, natural products, and structures distant from training data.
  • No uncertainty quantification: Predictions lack confidence intervals. A 0.6 probability could reflect genuine uncertainty or training data bias.
  • Endpoint-specific accuracy: Regression models (Caco-2, solubility, clearance) have published accuracy metrics; classification thresholds are less rigorously calibrated.
  • Correlation not causation: Models identify structural patterns associated with properties but do not explain mechanisms.