PDB to SDF converter

What is PDB to SDF Conversion?

The PDB to SDF converter transforms Protein Data Bank files into Structure Data Format files, bridging the gap between structural biology and cheminformatics. This conversion enables researchers to take protein structures from crystallographic databases and prepare them for chemical analysis, drug discovery workflows, and molecular modeling applications.

PDB format structure

PDB files use a rigid, fixed-width column format that dates back to the 1970s when punch cards limited data storage. Each line represents a specific record type, with atomic coordinates stored in precisely defined character positions.

Here's what a typical PDB file looks like:

HEADER    HYDROLASE/HYDROLASE INHIBITOR           20-MAY-97   1HTM
ATOM      1  N   ALA A   1      20.154  16.967  27.462  1.00 11.99           N
ATOM      2  CA  ALA A   1      19.030  16.097  26.849  1.00 11.85           C
ATOM      3  C   ALA A   1      17.666  16.817  26.697  1.00 11.89           C
ATOM      4  O   ALA A   1      17.657  18.049  26.530  1.00 12.15           O
ATOM      5  CB  ALA A   1      19.464  15.573  25.481  1.00 11.77           C
HETATM 1234  C1  ATP A 501      15.678  14.234  23.456  1.00 20.15           C
HETATM 1235  N1  ATP A 501      16.789  13.567  24.123  1.00 19.87           N
END

The format stores each atom on a separate line with coordinates at fixed positions: X at columns 31-38, Y at columns 39-46, and Z at columns 47-54. The format includes chain identifiers (column 22), residue names (columns 18-20), and atom types (columns 13-16).

SDF format structure

SDF files take a completely different approach, using a flexible block structure that prioritizes chemical information over rigid formatting. Each molecule begins with a header block, followed by a connection table, and ends with property data.

Here's the same molecular information represented in SDF format:

Alanine residue from 1HTM
  Converted from PDB format

  5  4  0  0  0  0  0  0  0  0999 V2000
   20.1540   16.9670   27.4620 N   0  0  0  0  0  0  0  0  0  0  0  0
   19.0300   16.0970   26.8490 C   0  0  0  0  0  0  0  0  0  0  0  0
   17.6660   16.8170   26.6970 C   0  0  0  0  0  0  0  0  0  0  0  0
   17.6570   18.0490   26.5300 O   0  0  0  0  0  0  0  0  0  0  0  0
   19.4640   15.5730   25.4810 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
  2  5  1  0  0  0  0
  3  4  2  0  0  0  0
M  END
> <CHAIN_ID>
A

> <RESIDUE_NAME>
ALA

> <RESIDUE_NUMBER>
1

$$$$

The SDF format explicitly defines molecular connectivity through bond tables (lines starting with atom numbers), while PDB relies on standard residue templates and proximity calculations to infer bonds.

Format comparison in practice

The fundamental difference becomes apparent when examining how each format handles the same chemical information. PDB treats molecules as collections of atoms with implicit relationships, while SDF treats them as explicit chemical graphs with defined connectivity.

Data Organization: PDB organizes information hierarchically by biological relevance (chain → residue → atom), whereas SDF organizes by chemical connectivity (atoms → bonds → properties). This means a protein chain in PDB becomes multiple separate molecular entries in SDF format.

Coordinate Precision: PDB coordinates are stored as fixed-precision text (3 decimal places), while SDF allows variable precision. This can affect downstream computational chemistry calculations that require high-precision coordinates.

Chemical Context: PDB preserves biological context like secondary structure and experimental conditions, while SDF focuses on chemical properties and molecular descriptors. Converting from PDB to SDF necessarily loses some biological context while gaining chemical specificity.