
Molecular descriptors
Calculate RDKit physicochemical descriptors, drug-likeness scores, rule violations, and structural fingerprints from SMILES.
Related tools

Aggrescan3D
Faithful static-mode Aggrescan3D wrapper for per-residue aggregation propensity analysis from a single protein structure.

PROPKA 3
Predict pKa values of ionizable groups in proteins and protein-ligand complexes from 3D structure. PROPKA calculates environment-driven pKa shifts for standard ionizable residues, terminal groups, and supported ligand atom types.

Protein charge plot
Plot net charge vs pH for protein sequences. Visualize how protein charge changes across pH 0-14 and identify the isoelectric point (pI) where the net charge crosses zero.

FindPept
Match experimental peptide masses against theoretical digest fragments of a protein sequence. Identify peptides from mass spectrometry data by peptide mass fingerprinting.

Hydropathy plot
Generate Kyte-Doolittle hydropathy plots to visualize hydrophobic and hydrophilic regions along protein sequences. Identify transmembrane domains and surface-exposed regions.

Hydrophobicity plot
Generate hydrophobicity plots using 24 different amino acid scales. Visualize hydrophobic and hydrophilic regions for protein analysis, epitope prediction, and membrane protein studies.

Peptide cutter
Predict protease and chemical cleavage sites across a protein sequence for up to 39 enzymes simultaneously. Identify where each enzyme cuts, the cleavage residue, and context window around each site.

Peptide mass calculator
Cleave a protein sequence with a chosen protease and compute the masses of the resulting peptides. Supports multiple enzymes, missed cleavages, chemical modifications, and different ion types for mass spectrometry experiment planning.

Protein parameters
Calculate protein parameters, including molecular weight, theoretical pI, extinction coefficients, aromaticity, secondary structure fractions, atomic composition, estimated half-life, and several indices, including instability, aliphatic index, and GRAVY.

Protein scale profiler
Generate amino acid property profiles using 42 different scales spanning hydrophobicity, secondary structure propensity, flexibility, polarity, surface accessibility, antigenicity, and more.
What are molecular descriptors?
Molecular descriptors turn a chemical structure into numeric features. For small molecules, these features describe size, polarity, lipophilicity, hydrogen bonding, ring systems, synthetic accessibility, and structural patterns that can be used for drug-likeness screening, QSAR modeling, similarity search, and machine-learning workflows.
ProteinIQ calculates descriptors from SMILES with RDKit. The default output includes a compact drug-discovery summary first, then RDKit's full descriptor catalog so the same result can support both quick inspection and downstream modeling.
How to use Molecular Descriptors online
Paste SMILES strings, upload a SMILES file, or fetch compounds from PubChem to calculate RDKit descriptors online. ProteinIQ returns a downloadable table with physicochemical properties, QED, SA Score, Lipinski, Veber, and Ghose rule violations, plus optional Morgan/ECFP4 and MACCS fingerprints for ML-ready feature export.
Inputs
| Input | Description |
|---|---|
Molecule | One SMILES string per line. Named rows can use tab-separated name\tSMILES format. |
| File upload | .txt, .smi, .csv, and .smiles files. |
| PubChem fetch | Compound names or CIDs fetched as SMILES. |
Settings
| Setting | Description |
|---|---|
All descriptors | Default. Summary screening columns followed by RDKit's full descriptor catalog, including EState, BCUT, VSA, Chi, Kappa, graph, and fragment descriptors. |
Physicochemical | Compact property table with formula, MW, exact MW, LogP, TPSA, HBA, HBD, rotatable bonds, rings, Fsp3, MR, formal charge, atom count, and bond count. |
Drug-likeness | QED, SA Score, Lipinski violations, Veber violations, Ghose violations, and the core properties behind those filters. |
Fingerprints | Morgan/ECFP4 2048-bit and MACCS 166-bit fingerprints as bitstrings with on-bit counts. |
Results
The All descriptors table starts with interpretable columns that are commonly used in medicinal chemistry, then appends RDKit descriptor columns. RDKit catalog values are rounded to four decimal places in the table for stable export and readable wide spreadsheets.
| Column | Meaning |
|---|---|
Name | Compound label from the input, or an automatically assigned row name. |
SMILES | Input SMILES string. |
Formula | Molecular formula from RDKit. |
MW | Average molecular weight in g/mol. |
ExactMolWt | Exact molecular weight from isotope masses. |
LogP | Wildman-Crippen octanol-water partition estimate. |
TPSA | Topological polar surface area. |
HBA, HBD | Hydrogen bond acceptor and donor counts. |
Rotatable Bonds | Non-ring rotatable bond count. |
Aromatic Rings, Rings | Aromatic ring count and total ring count. |
Fsp3 | Fraction of carbon atoms with sp3 hybridization. |
MR | Molar refractivity. |
QED | Quantitative Estimate of Drug-likeness, from 0 to 1. |
SA Score | Synthetic accessibility score. Lower values indicate easier synthesis. |
Lipinski Violations | Count of Rule of 5 criteria outside the usual oral-drug ranges. |
Veber Violations | Count of TPSA and rotatable-bond criteria outside Veber limits. |
Ghose Violations | Count of MW, LogP, atom-count, and molar-refractivity criteria outside Ghose filter ranges. |
| RDKit descriptor columns | Full RDKit descriptor catalog, such as MaxAbsEStateIndex, BCUT2D_MWHI, FractionCSP3, MolMR, and fr_Al_OH. |
Interpreting results
Descriptor filters are fast triage tools. They identify molecules that look unusual relative to historical oral drugs, not molecules that are guaranteed to fail.
| Metric | Practical interpretation |
|---|---|
MW | Most oral small-molecule drugs are roughly 150-500 Da. Larger molecules can still work, but permeability and formulation become harder. |
LogP | Negative values are hydrophilic. High values indicate lipophilicity, which can improve membrane passage but also increase insolubility and nonspecific binding. |
TPSA | Values below 140 A2 are commonly used for oral absorption screens. Values below 90 A2 are more compatible with CNS exposure. |
HBA, HBD | High hydrogen-bonding capacity can improve target binding and solubility, but it also increases the cost of crossing membranes. |
QED | Higher values indicate a property profile closer to known oral drugs. A value above 0.67 is often considered favorable. |
SA Score | Scores near 1-3 indicate easier synthesis. Higher scores indicate more complex or less accessible structures. |
Rule violations
| Rule | Criteria counted |
|---|---|
Lipinski Violations | MW greater than 500, LogP greater than 5, HBD greater than 5, HBA greater than 10. |
Veber Violations | Rotatable bonds greater than 10, TPSA greater than 140 A2. |
Ghose Violations | MW outside 160-480, LogP outside -0.4 to 5.6, atom count outside 20-70, MR outside 40-130. |
Zero or one violation is usually compatible with early oral-drug screening. Two or more violations need context. Natural products, macrocycles, targeted degraders, covalent ligands, and non-oral compounds often sit outside these ranges for valid reasons.
Fingerprints for ML and similarity search
Fingerprints encode molecular substructures as binary vectors. They are not descriptors in the same sense as MW or LogP; they are sparse structural features for comparing molecules or training models.
| Fingerprint | Use case |
|---|---|
Morgan ECFP4 (2048-bit) | Circular radius-2 fingerprint used for similarity search, clustering, QSAR, virtual screening, and neural-network features. |
MACCS Keys (166-bit) | Fixed structural-key fingerprint that is compact, interpretable, and useful for simple similarity or legacy QSAR workflows. |
The fingerprint preset returns bitstrings rather than thousands of visible spreadsheet columns. Compact bitstring output keeps the browser table readable while preserving exportable ML features.
How Molecular Descriptors works
RDKit parses each SMILES string into a molecular graph. Descriptor functions then operate on that graph without generating 3D conformers. The calculation is fast and deterministic for large compound lists, but the output does not capture conformational strain, 3D shape, explicit solvation, metabolism, or protein-ligand interactions.
The All descriptors preset uses RDKit's descriptor catalog through Descriptors.CalcMolDescriptors. That catalog includes:
- Constitutional descriptors: Atom counts, molecular weight, exact mass, valence electrons, heteroatom counts, and ring counts.
- Topological descriptors: Chi indices, Kappa shape indices, BalabanJ, BertzCT, and related graph features.
- Surface-area descriptors: TPSA, LabuteASA, PEOE_VSA, SMR_VSA, SlogP_VSA, EState_VSA, and VSA_EState columns.
- Electronic descriptors: EState indices, partial-charge summaries, and BCUT2D columns.
- Fragment descriptors: Functional group counts such as
fr_Al_OH,fr_ester,fr_amide,fr_halogen, andfr_benzene.
Invalid SMILES rows are reported with an error column instead of stopping the whole job.
When to use Molecular descriptors vs alternatives
Molecular Descriptors is the broad feature-generation tool. It is best when a workflow needs raw properties, RDKit descriptors, or fingerprints for custom filtering, QSAR, clustering, or model input.
Use a focused filter when the goal is a single decision rule rather than a wide feature table.
| Tool | Best fit |
|---|---|
| Molecular Descriptors | Broad RDKit descriptor table, ML feature export, QSAR features, custom screening filters. |
| Lipinski's Rule of 5 | Quick oral drug-likeness check using the classic Rule of 5. |
| Veber's Rule | Oral bioavailability screen focused on molecular flexibility and TPSA. |
| QEPPI | Drug-likeness scoring for protein-protein interaction inhibitors, where classical oral-drug filters are often too strict. |
| ADMET-AI | Predicted absorption, distribution, metabolism, excretion, and toxicity endpoints after descriptor-level triage. |