Molecular descriptors

Calculate RDKit physicochemical descriptors, drug-likeness scores, rule violations, and structural fingerprints from SMILES.

Input

Molecule

0/1 sequences

Descriptor set

1 credit

Output

Configure input settings on the left, then click "Submit job"orLoad an example (it's free)

Common pharmaceuticals

Input

Molecule

0/1 sequences

Descriptor set

1 credit

Output

Configure input settings on the left, then click "Submit job"orLoad an example (it's free)

Common pharmaceuticals

What are molecular descriptors?

Molecular descriptors turn a chemical structure into numeric features. For small molecules, these features describe size, polarity, lipophilicity, hydrogen bonding, ring systems, synthetic accessibility, and structural patterns that can be used for drug-likeness screening, QSAR modeling, similarity search, and machine-learning workflows.

ProteinIQ calculates descriptors from SMILES with RDKit. The default output includes a compact drug-discovery summary first, then RDKit's full descriptor catalog so the same result can support both quick inspection and downstream modeling.

How to use Molecular Descriptors online

Paste SMILES strings, upload a SMILES file, or fetch compounds from PubChem to calculate RDKit descriptors online. ProteinIQ returns a downloadable table with physicochemical properties, QED, SA Score, Lipinski, Veber, and Ghose rule violations, plus optional Morgan/ECFP4 and MACCS fingerprints for ML-ready feature export.

Inputs

Input	Description
`Molecule`	One SMILES string per line. Named rows can use tab-separated `name\tSMILES` format.
File upload	`.txt`, `.smi`, `.csv`, and `.smiles` files.
PubChem fetch	Compound names or CIDs fetched as SMILES.

Settings

Setting	Description
`All descriptors`	Default. Summary screening columns followed by RDKit's full descriptor catalog, including EState, BCUT, VSA, Chi, Kappa, graph, and fragment descriptors.
`Physicochemical`	Compact property table with formula, MW, exact MW, LogP, TPSA, HBA, HBD, rotatable bonds, rings, Fsp3, MR, formal charge, atom count, and bond count.
`Drug-likeness`	QED, SA Score, Lipinski violations, Veber violations, Ghose violations, and the core properties behind those filters.
`Fingerprints`	Morgan/ECFP4 2048-bit and MACCS 166-bit fingerprints as bitstrings with on-bit counts.

Results

The All descriptors table starts with interpretable columns that are commonly used in medicinal chemistry, then appends RDKit descriptor columns. RDKit catalog values are rounded to four decimal places in the table for stable export and readable wide spreadsheets.

Column	Meaning
`Name`	Compound label from the input, or an automatically assigned row name.
`SMILES`	Input SMILES string.
`Formula`	Molecular formula from RDKit.
`MW`	Average molecular weight in g/mol.
`ExactMolWt`	Exact molecular weight from isotope masses.
`LogP`	Wildman-Crippen octanol-water partition estimate.
`TPSA`	Topological polar surface area.
`HBA`, `HBD`	Hydrogen bond acceptor and donor counts.
`Rotatable Bonds`	Non-ring rotatable bond count.
`Aromatic Rings`, `Rings`	Aromatic ring count and total ring count.
`Fsp3`	Fraction of carbon atoms with sp3 hybridization.
`MR`	Molar refractivity.
`QED`	Quantitative Estimate of Drug-likeness, from 0 to 1.
`SA Score`	Synthetic accessibility score. Lower values indicate easier synthesis.
`Lipinski Violations`	Count of Rule of 5 criteria outside the usual oral-drug ranges.
`Veber Violations`	Count of TPSA and rotatable-bond criteria outside Veber limits.
`Ghose Violations`	Count of MW, LogP, atom-count, and molar-refractivity criteria outside Ghose filter ranges.
RDKit descriptor columns	Full RDKit descriptor catalog, such as `MaxAbsEStateIndex`, `BCUT2D_MWHI`, `FractionCSP3`, `MolMR`, and `fr_Al_OH`.

Interpreting results

Descriptor filters are fast triage tools. They identify molecules that look unusual relative to historical oral drugs, not molecules that are guaranteed to fail.

Metric	Practical interpretation
`MW`	Most oral small-molecule drugs are roughly 150-500 Da. Larger molecules can still work, but permeability and formulation become harder.
`LogP`	Negative values are hydrophilic. High values indicate lipophilicity, which can improve membrane passage but also increase insolubility and nonspecific binding.
`TPSA`	Values below 140 A2 are commonly used for oral absorption screens. Values below 90 A2 are more compatible with CNS exposure.
`HBA`, `HBD`	High hydrogen-bonding capacity can improve target binding and solubility, but it also increases the cost of crossing membranes.
`QED`	Higher values indicate a property profile closer to known oral drugs. A value above 0.67 is often considered favorable.
`SA Score`	Scores near 1-3 indicate easier synthesis. Higher scores indicate more complex or less accessible structures.

Rule violations

Rule	Criteria counted
`Lipinski Violations`	MW greater than 500, LogP greater than 5, HBD greater than 5, HBA greater than 10.
`Veber Violations`	Rotatable bonds greater than 10, TPSA greater than 140 A2.
`Ghose Violations`	MW outside 160-480, LogP outside -0.4 to 5.6, atom count outside 20-70, MR outside 40-130.

Zero or one violation is usually compatible with early oral-drug screening. Two or more violations need context. Natural products, macrocycles, targeted degraders, covalent ligands, and non-oral compounds often sit outside these ranges for valid reasons.

Fingerprints for ML and similarity search

Fingerprints encode molecular substructures as binary vectors. They are not descriptors in the same sense as MW or LogP; they are sparse structural features for comparing molecules or training models.

Fingerprint	Use case
`Morgan ECFP4 (2048-bit)`	Circular radius-2 fingerprint used for similarity search, clustering, QSAR, virtual screening, and neural-network features.
`MACCS Keys (166-bit)`	Fixed structural-key fingerprint that is compact, interpretable, and useful for simple similarity or legacy QSAR workflows.

The fingerprint preset returns bitstrings rather than thousands of visible spreadsheet columns. Compact bitstring output keeps the browser table readable while preserving exportable ML features.

How Molecular Descriptors works

RDKit parses each SMILES string into a molecular graph. Descriptor functions then operate on that graph without generating 3D conformers. The calculation is fast and deterministic for large compound lists, but the output does not capture conformational strain, 3D shape, explicit solvation, metabolism, or protein-ligand interactions.

The All descriptors preset uses RDKit's descriptor catalog through Descriptors.CalcMolDescriptors. That catalog includes:

Constitutional descriptors: Atom counts, molecular weight, exact mass, valence electrons, heteroatom counts, and ring counts.
Topological descriptors: Chi indices, Kappa shape indices, BalabanJ, BertzCT, and related graph features.
Surface-area descriptors: TPSA, LabuteASA, PEOE_VSA, SMR_VSA, SlogP_VSA, EState_VSA, and VSA_EState columns.
Electronic descriptors: EState indices, partial-charge summaries, and BCUT2D columns.
Fragment descriptors: Functional group counts such as fr_Al_OH, fr_ester, fr_amide, fr_halogen, and fr_benzene.

Invalid SMILES rows are reported with an error column instead of stopping the whole job.

When to use Molecular descriptors vs alternatives

Molecular Descriptors is the broad feature-generation tool. It is best when a workflow needs raw properties, RDKit descriptors, or fingerprints for custom filtering, QSAR, clustering, or model input.

Use a focused filter when the goal is a single decision rule rather than a wide feature table.

Tool	Best fit
Molecular Descriptors	Broad RDKit descriptor table, ML feature export, QSAR features, custom screening filters.
Lead-likeness filter	Focused lead-like property checks before optimization or analog design.
Lipinski's Rule of 5	Quick oral drug-likeness check using the classic Rule of 5.
Veber's Rule	Oral bioavailability screen focused on molecular flexibility and TPSA.
QEPPI	Drug-likeness scoring for protein-protein interaction inhibitors, where classical oral-drug filters are often too strict.
ADMET-AI	Predicted absorption, distribution, metabolism, excretion, and toxicity endpoints after descriptor-level triage.

Related tools

Aggrescan3D

Faithful static-mode Aggrescan3D tool for per-residue aggregation propensity analysis from a single protein structure.

PROPKA 3

Predict pKa values of ionizable groups in proteins and protein-ligand complexes from 3D structure. PROPKA calculates environment-driven pKa shifts for standard ionizable residues, terminal groups, and supported ligand atom types.

Protein charge plot

Plot net charge vs pH for protein sequences. Visualize how protein charge changes across pH 0-14 and identify the isoelectric point (pI) where the net charge crosses zero.

FindPept

Match experimental peptide masses against theoretical digest fragments of a protein sequence. Identify peptides from mass spectrometry data by peptide mass fingerprinting.

Hydropathy plot

Generate Kyte-Doolittle hydropathy plots to visualize hydrophobic and hydrophilic regions along protein sequences. Identify transmembrane domains and surface-exposed regions.

Hydrophobicity plot

Generate hydrophobicity plots using 24 different amino acid scales. Visualize hydrophobic and hydrophilic regions for protein analysis, epitope prediction, and membrane protein studies.

Peptide cutter

Predict protease and chemical cleavage sites across a protein sequence for up to 39 enzymes simultaneously. Identify where each enzyme cuts, the cleavage residue, and context window around each site.

Peptide mass calculator

Cleave a protein sequence with a chosen protease and compute the masses of the resulting peptides. Supports multiple enzymes, missed cleavages, chemical modifications, and different ion types for mass spectrometry experiment planning.

Protein parameters

Calculate protein parameters, including molecular weight, theoretical pI, extinction coefficients, aromaticity, secondary structure fractions, atomic composition, estimated half-life, and several indices, including instability, aliphatic index, and GRAVY.

Protein scale profiler

Generate amino acid property profiles using 42 different scales spanning hydrophobicity, secondary structure propensity, flexibility, polarity, surface accessibility, antigenicity, and more.

Tools

Molecular descriptors

Input

Output

Input

Output

What are molecular descriptors?

How to use Molecular Descriptors online

Inputs

Settings

Results

Interpreting results

Rule violations

Fingerprints for ML and similarity search

How Molecular Descriptors works

When to use Molecular descriptors vs alternatives

Related tools

Aggrescan3D

PROPKA 3

Protein charge plot

FindPept

Hydropathy plot

Hydrophobicity plot

Peptide cutter

Peptide mass calculator

Protein parameters

Protein scale profiler

Tools

Input

Output

Input

Output