Molecular descriptors

Calculate RDKit physicochemical descriptors, drug-likeness scores, rule violations, and structural fingerprints from SMILES.

0/3 sequences
1
Configure input settings on the left, then click "Submit"

Related tools

Aggrescan3D

Aggrescan3D

Faithful static-mode Aggrescan3D wrapper for per-residue aggregation propensity analysis from a single protein structure.

PROPKA 3

PROPKA 3

Predict pKa values of ionizable groups in proteins and protein-ligand complexes from 3D structure. PROPKA calculates environment-driven pKa shifts for standard ionizable residues, terminal groups, and supported ligand atom types.

Protein charge plot

Protein charge plot

Plot net charge vs pH for protein sequences. Visualize how protein charge changes across pH 0-14 and identify the isoelectric point (pI) where the net charge crosses zero.

FindPept

FindPept

Match experimental peptide masses against theoretical digest fragments of a protein sequence. Identify peptides from mass spectrometry data by peptide mass fingerprinting.

Hydropathy plot

Hydropathy plot

Generate Kyte-Doolittle hydropathy plots to visualize hydrophobic and hydrophilic regions along protein sequences. Identify transmembrane domains and surface-exposed regions.

Hydrophobicity plot

Hydrophobicity plot

Generate hydrophobicity plots using 24 different amino acid scales. Visualize hydrophobic and hydrophilic regions for protein analysis, epitope prediction, and membrane protein studies.

Peptide cutter

Peptide cutter

Predict protease and chemical cleavage sites across a protein sequence for up to 39 enzymes simultaneously. Identify where each enzyme cuts, the cleavage residue, and context window around each site.

Peptide mass calculator

Peptide mass calculator

Cleave a protein sequence with a chosen protease and compute the masses of the resulting peptides. Supports multiple enzymes, missed cleavages, chemical modifications, and different ion types for mass spectrometry experiment planning.

Protein parameters

Protein parameters

Calculate protein parameters, including molecular weight, theoretical pI, extinction coefficients, aromaticity, secondary structure fractions, atomic composition, estimated half-life, and several indices, including instability, aliphatic index, and GRAVY.

Protein scale profiler

Protein scale profiler

Generate amino acid property profiles using 42 different scales spanning hydrophobicity, secondary structure propensity, flexibility, polarity, surface accessibility, antigenicity, and more.

What are molecular descriptors?

Molecular descriptors turn a chemical structure into numeric features. For small molecules, these features describe size, polarity, lipophilicity, hydrogen bonding, ring systems, synthetic accessibility, and structural patterns that can be used for drug-likeness screening, QSAR modeling, similarity search, and machine-learning workflows.

ProteinIQ calculates descriptors from SMILES with RDKit. The default output includes a compact drug-discovery summary first, then RDKit's full descriptor catalog so the same result can support both quick inspection and downstream modeling.

How to use Molecular Descriptors online

Paste SMILES strings, upload a SMILES file, or fetch compounds from PubChem to calculate RDKit descriptors online. ProteinIQ returns a downloadable table with physicochemical properties, QED, SA Score, Lipinski, Veber, and Ghose rule violations, plus optional Morgan/ECFP4 and MACCS fingerprints for ML-ready feature export.

Inputs

InputDescription
MoleculeOne SMILES string per line. Named rows can use tab-separated name\tSMILES format.
File upload.txt, .smi, .csv, and .smiles files.
PubChem fetchCompound names or CIDs fetched as SMILES.

Settings

SettingDescription
All descriptorsDefault. Summary screening columns followed by RDKit's full descriptor catalog, including EState, BCUT, VSA, Chi, Kappa, graph, and fragment descriptors.
PhysicochemicalCompact property table with formula, MW, exact MW, LogP, TPSA, HBA, HBD, rotatable bonds, rings, Fsp3, MR, formal charge, atom count, and bond count.
Drug-likenessQED, SA Score, Lipinski violations, Veber violations, Ghose violations, and the core properties behind those filters.
FingerprintsMorgan/ECFP4 2048-bit and MACCS 166-bit fingerprints as bitstrings with on-bit counts.

Results

The All descriptors table starts with interpretable columns that are commonly used in medicinal chemistry, then appends RDKit descriptor columns. RDKit catalog values are rounded to four decimal places in the table for stable export and readable wide spreadsheets.

ColumnMeaning
NameCompound label from the input, or an automatically assigned row name.
SMILESInput SMILES string.
FormulaMolecular formula from RDKit.
MWAverage molecular weight in g/mol.
ExactMolWtExact molecular weight from isotope masses.
LogPWildman-Crippen octanol-water partition estimate.
TPSATopological polar surface area.
HBA, HBDHydrogen bond acceptor and donor counts.
Rotatable BondsNon-ring rotatable bond count.
Aromatic Rings, RingsAromatic ring count and total ring count.
Fsp3Fraction of carbon atoms with sp3 hybridization.
MRMolar refractivity.
QEDQuantitative Estimate of Drug-likeness, from 0 to 1.
SA ScoreSynthetic accessibility score. Lower values indicate easier synthesis.
Lipinski ViolationsCount of Rule of 5 criteria outside the usual oral-drug ranges.
Veber ViolationsCount of TPSA and rotatable-bond criteria outside Veber limits.
Ghose ViolationsCount of MW, LogP, atom-count, and molar-refractivity criteria outside Ghose filter ranges.
RDKit descriptor columnsFull RDKit descriptor catalog, such as MaxAbsEStateIndex, BCUT2D_MWHI, FractionCSP3, MolMR, and fr_Al_OH.

Interpreting results

Descriptor filters are fast triage tools. They identify molecules that look unusual relative to historical oral drugs, not molecules that are guaranteed to fail.

MetricPractical interpretation
MWMost oral small-molecule drugs are roughly 150-500 Da. Larger molecules can still work, but permeability and formulation become harder.
LogPNegative values are hydrophilic. High values indicate lipophilicity, which can improve membrane passage but also increase insolubility and nonspecific binding.
TPSAValues below 140 A2 are commonly used for oral absorption screens. Values below 90 A2 are more compatible with CNS exposure.
HBA, HBDHigh hydrogen-bonding capacity can improve target binding and solubility, but it also increases the cost of crossing membranes.
QEDHigher values indicate a property profile closer to known oral drugs. A value above 0.67 is often considered favorable.
SA ScoreScores near 1-3 indicate easier synthesis. Higher scores indicate more complex or less accessible structures.

Rule violations

RuleCriteria counted
Lipinski ViolationsMW greater than 500, LogP greater than 5, HBD greater than 5, HBA greater than 10.
Veber ViolationsRotatable bonds greater than 10, TPSA greater than 140 A2.
Ghose ViolationsMW outside 160-480, LogP outside -0.4 to 5.6, atom count outside 20-70, MR outside 40-130.

Zero or one violation is usually compatible with early oral-drug screening. Two or more violations need context. Natural products, macrocycles, targeted degraders, covalent ligands, and non-oral compounds often sit outside these ranges for valid reasons.

Fingerprints encode molecular substructures as binary vectors. They are not descriptors in the same sense as MW or LogP; they are sparse structural features for comparing molecules or training models.

FingerprintUse case
Morgan ECFP4 (2048-bit)Circular radius-2 fingerprint used for similarity search, clustering, QSAR, virtual screening, and neural-network features.
MACCS Keys (166-bit)Fixed structural-key fingerprint that is compact, interpretable, and useful for simple similarity or legacy QSAR workflows.

The fingerprint preset returns bitstrings rather than thousands of visible spreadsheet columns. Compact bitstring output keeps the browser table readable while preserving exportable ML features.

How Molecular Descriptors works

RDKit parses each SMILES string into a molecular graph. Descriptor functions then operate on that graph without generating 3D conformers. The calculation is fast and deterministic for large compound lists, but the output does not capture conformational strain, 3D shape, explicit solvation, metabolism, or protein-ligand interactions.

The All descriptors preset uses RDKit's descriptor catalog through Descriptors.CalcMolDescriptors. That catalog includes:

  • Constitutional descriptors: Atom counts, molecular weight, exact mass, valence electrons, heteroatom counts, and ring counts.
  • Topological descriptors: Chi indices, Kappa shape indices, BalabanJ, BertzCT, and related graph features.
  • Surface-area descriptors: TPSA, LabuteASA, PEOE_VSA, SMR_VSA, SlogP_VSA, EState_VSA, and VSA_EState columns.
  • Electronic descriptors: EState indices, partial-charge summaries, and BCUT2D columns.
  • Fragment descriptors: Functional group counts such as fr_Al_OH, fr_ester, fr_amide, fr_halogen, and fr_benzene.

Invalid SMILES rows are reported with an error column instead of stopping the whole job.

When to use Molecular descriptors vs alternatives

Molecular Descriptors is the broad feature-generation tool. It is best when a workflow needs raw properties, RDKit descriptors, or fingerprints for custom filtering, QSAR, clustering, or model input.

Use a focused filter when the goal is a single decision rule rather than a wide feature table.

ToolBest fit
Molecular DescriptorsBroad RDKit descriptor table, ML feature export, QSAR features, custom screening filters.
Lipinski's Rule of 5Quick oral drug-likeness check using the classic Rule of 5.
Veber's RuleOral bioavailability screen focused on molecular flexibility and TPSA.
QEPPIDrug-likeness scoring for protein-protein interaction inhibitors, where classical oral-drug filters are often too strict.
ADMET-AIPredicted absorption, distribution, metabolism, excretion, and toxicity endpoints after descriptor-level triage.