Getting started

Glossary

What does that mean? Glad you asked. Below are definitions for the terms you'll come across while using ProteinIQ.

File formats & data

FASTA
A text-based format for representing nucleotide or amino acid sequences. Begins with a '>' header line followed by sequence data.

FASTQ
An extension of FASTA format that includes quality scores for each nucleotide, commonly used for next-generation sequencing data.

PDB (Protein Data Bank)
A standard file format for representing three-dimensional macromolecular structures, including atomic coordinates and experimental metadata. See PDB fixer for structure preparation.

SDF (Structure Data File)
A chemical file format for storing molecular structures with associated data, commonly used in cheminformatics and drug discovery.

SMILES (Simplified Molecular Input Line Entry System)
A text notation for representing chemical structures as a linear string, enabling computational analysis of molecular properties.

GenBank
A comprehensive nucleotide sequence database format maintained by NCBI, containing annotated genetic sequences with biological context.

CIF (Crystallographic Information File)
A modern format for macromolecular structures offering improved precision and extensibility compared to PDB format.

MOL2
A chemical file format that includes atom types, bond types, and partial charges, used in molecular modeling and docking studies.


Bioinformatics

Protein
A biological macromolecule composed of amino acid residues linked by peptide bonds. Proteins perform diverse cellular functions including catalysis, signaling, and structural support.

Amino acid
The building blocks of proteins. There are 20 standard amino acids, each with distinct chemical properties that determine protein structure and function.

Sequence
The linear order of amino acids in a protein or nucleotides in DNA/RNA. The sequence determines the molecule's structure and biological properties.

Secondary structure
Regular local structural patterns in proteins, primarily alpha helices and beta sheets, formed through backbone hydrogen bonding.

Alpha helix
A right-handed helical protein secondary structure stabilized by backbone hydrogen bonds between residues i and i+4. The most common secondary structure element.

Beta sheet
Extended protein secondary structure formed by lateral hydrogen bonding between strands. Can be parallel, antiparallel, or mixed.

Turn
Short protein segments (3-5 residues) where the chain reverses direction, often connecting secondary structure elements.

Random coil
Irregular, flexible protein regions lacking regular secondary structure. Often serve as linkers or participate in dynamic functions.

Molecular weight
The total mass of a protein molecule expressed in Daltons (Da), calculated from the sum of amino acid residue masses. Essential for SDS-PAGE interpretation and mass spectrometry.

Isoelectric point (pI)
The pH at which a protein carries no net electrical charge. Below pI, proteins are positively charged; above pI, negatively charged. Critical for purification strategies.

For detailed calculations, see our Protein Parameters guide.

Extinction coefficient
A measure of how strongly a protein absorbs light at 280 nm, determined by aromatic amino acid content. Used to calculate protein concentration spectrophotometrically.

GRAVY (Grand Average of Hydropathicity)
A score representing overall protein hydrophobicity, calculated by averaging hydropathy values across all amino acids. Positive values indicate hydrophobic character.

Instability index
A predictor of protein stability based on dipeptide composition. Values below 40 suggest stable proteins; above 40 indicates potential instability in cellular environments.

Aliphatic index
A measure of thermal stability based on the relative volume of aliphatic amino acids (Ala, Val, Ile, Leu). Higher values correlate with thermostability.

Hydrophobicity
The tendency of nonpolar amino acids to cluster together, excluding water. A primary driving force in protein folding.

Disulfide bond
A covalent bond between two cysteine residues formed through oxidation of their sulfhydryl groups. Provides structural stability.

Half-life
The time required for a protein's concentration to decrease by 50%, reflecting cellular degradation rates and biological stability.


Small molecule properties

Small molecule
A low molecular weight organic compound (typically <900 Da) that can modulate biological processes. Most drugs are small molecules.

LogP (Partition coefficient)
The logarithm of the octanol-water partition coefficient, measuring lipophilicity. Values between 1-4 are typically optimal for oral drugs.

For comprehensive property analysis, explore our Molecular Descriptors tool.

TPSA (Topological Polar Surface Area): The surface area occupied by polar atoms (N, O) and attached hydrogens. Values <60 Ų suggest good membrane permeability and potential blood-brain barrier penetration.

Hydrogen bond donors (HBD): The count of N-H and O-H groups capable of donating hydrogen atoms in hydrogen bonding. Lipinski's Rule suggests ≤5 for oral bioavailability.

Hydrogen bond acceptors (HBA): The count of nitrogen and oxygen atoms capable of accepting hydrogen bonds. Lipinski's Rule recommends ≤10 for drug-like compounds.

Rotatable bonds: Single bonds (excluding terminal bonds) that allow free rotation, indicating molecular flexibility. Fewer rotatable bonds generally improve oral bioavailability.

Aromatic rings: Planar cyclic structures with delocalized pi electrons. Aromatic character influences binding affinity, metabolic stability, and physicochemical properties.

QED (Quantitative Estimate of Drug-likeness): A composite score (0-1) that quantifies how closely a molecule resembles known drugs based on eight molecular properties.

Molecular descriptors
Numerical representations of chemical structure encoding molecular properties. Include physical properties (molecular weight), topological features (TPSA), and structural characteristics (ring counts).


Drug discovery & pharmacology

Drug discovery
The process of identifying and developing new therapeutic compounds, from initial screening through clinical trials to regulatory approval.

For more on drug discovery methods, check out our Lipinski's Rule of Five guide.

ADMET
Absorption, Distribution, Metabolism, Excretion, and Toxicity - the key pharmacokinetic and safety properties that determine drug success.

Learn more in our ADMET-AI documentation.

Bioavailability
The fraction of an administered drug that reaches systemic circulation unchanged. Oral bioavailability is a critical factor for drug development.

Oral bioavailability
The percentage of an orally administered drug that reaches systemic circulation. Poor bioavailability requires higher doses or alternative routes.

Lipinski's Rule of Five
Guidelines for predicting oral bioavailability: molecular weight ≤500 Da, LogP ≤5, ≤5 hydrogen bond donors, ≤10 hydrogen bond acceptors. Compounds violating >1 rule may have absorption issues.

Drug-likeness
A qualitative assessment of how similar a compound is to known drugs in terms of physicochemical properties and structural features.

Blood-brain barrier (BBB)
A highly selective membrane barrier that restricts passage of substances into the central nervous system. TPSA <60 Ų suggests potential BBB penetration.

Permeability
The rate at which a compound crosses biological membranes. High permeability is essential for absorption and tissue distribution.

P-glycoprotein (P-gp)
An efflux transporter that pumps drugs out of cells, affecting absorption, distribution, and drug resistance. P-gp substrates may have reduced bioavailability.

Cytochrome P450 (CYP)
A family of metabolic enzymes responsible for drug metabolism. CYP interactions are a primary source of drug-drug interactions.

Caco-2 assay
An in vitro model using human intestinal cells to predict oral drug absorption and intestinal permeability.

Plasma protein binding
The extent to which a drug binds to proteins in blood plasma. High binding can limit free drug concentration and therapeutic effect.

Metabolic stability
The resistance of a compound to metabolic degradation. Measured as half-life in liver microsomes or hepatocytes.

Clearance
The volume of plasma from which a drug is completely removed per unit time. Determines dosing frequency and accumulation.

PPI (Protein-Protein Interaction)
Physical contacts between protein molecules. PPIs are emerging drug targets, though compounds targeting them often violate traditional drug-likeness rules.

QEPPI
Quantitative Estimate of Protein-Protein Interaction targeting drug-likeness. An adaptation of drug-likeness assessment optimized for PPI modulators.


Toxicity & safety

hERG inhibition
Blockage of the hERG potassium channel, a key indicator of cardiotoxicity risk and potential for cardiac arrhythmias.

AMES mutagenicity
A bacterial assay predicting whether a compound can cause DNA mutations, serving as an indicator of carcinogenic potential.

Hepatotoxicity
Drug-induced liver injury, a leading cause of drug development failure and post-market withdrawal.

PAINS (Pan-Assay Interference Compounds)
Chemical structures that frequently produce false positives in screening assays through non-specific reactivity or fluorescence interference.

BRENK filters
Structural alerts identifying potentially toxic, reactive, or metabolically unstable chemical groups that may cause drug development issues.

Structural alerts
Molecular substructures associated with specific toxicity mechanisms or undesirable properties, used for filtering compound libraries.

LD₅₀ (Lethal Dose 50)
The dose at which 50% of test subjects die, providing a measure of acute toxicity. Lower values indicate higher toxicity.


Platform concepts

Client-side processing
Computations performed directly in your web browser using JavaScript, ensuring complete data privacy as files never leave your device. Ideal for format conversions and simple calculations.

Server-side processing
Computations performed on ProteinIQ servers using specialized software and machine learning models. Required for complex analyses like ADMET prediction and secondary structure prediction.

Job
A submitted analysis task running on ProteinIQ servers. Jobs are tracked in your dashboard and can be reviewed after completion.

Credits
The currency system for server-side computations. Different tools consume different credit amounts based on computational complexity.