
PDB to FASTA converter
Convert PDB to FASTA online. Upload a PDB structure file or fetch from RCSB to convert it to FASTA sequence format.
What is PDB to FASTA converter?
PDB to FASTA Converter extracts amino acid sequences from protein structure files. PDB files contain 3D atomic coordinates, but many bioinformatics tools require only the sequence in FASTA format. This tool bridges that gap by reading the structural data and outputting clean, properly formatted sequences.
The converter handles multi-chain complexes, NMR ensembles with multiple models, and structures with missing or modified residues. You can fetch structures directly from the RCSB Protein Data Bank using a 4-character PDB ID, or upload your own files.
For visualizing PDB structures before conversion, use the PDB Viewer. If your structure has issues like missing atoms or non-standard residues, the PDB Fixer can clean it up first.
How to convert PDB to FASTA?
PDB files store amino acid information in ATOM and HETATM records. Each record contains a three-letter residue code (like ALA for alanine, GLY for glycine) along with the chain identifier and residue number.
The conversion starts by rading these records in sequence order, extracting the residue codes for each chain. It then maps the three-letter codes to one-letter amino acid codes using the IUPAC convention.
The conversion follows the standard amino acid abbreviations:
| Three-letter | One-letter | Amino acid |
|---|---|---|
| ALA | A | Alanine |
| CYS | C | Cysteine |
| ASP | D | Aspartic acid |
| GLU | E | Glutamic acid |
| PHE | F | Phenylalanine |
| GLY | G | Glycine |
| HIS | H | Histidine |
| ILE | I | Isoleucine |
| LYS | K | Lysine |
| LEU | L | Leucine |
| MET | M | Methionine |
| ASN | N | Asparagine |
| PRO | P | Proline |
| GLN | Q | Glutamine |
| ARG | R | Arginine |
| SER | S | Serine |
| THR | T | Threonine |
| VAL | V | Valine |
| TRP | W | Tryptophan |
| TYR | Y | Tyrosine |
Inputs and settings
Chain selection
Chain selection: Choose which chains to extract from multi-chain structures. Use All chains for complete extraction, First chain only for simple monomers, or Specific chains to target particular chain IDs.
Chain IDs: When using Specific chains, enter comma-separated chain identifiers (e.g., A,B,C). Chain IDs in PDB files are single characters, typically letters.
Chain filtering
Enable chain filtering to refine your output when working with large complexes or structures with many small peptide fragments.
Minimum chain length: Exclude short chains that may be artifacts, tags, or crystallization additives. We recommend setting this to 20-30 residues when extracting only the protein of interest.
Maximum chain length: Exclude unusually long chains if needed. This is rarely used but helpful when isolating small binding peptides from complexes.
Merge identical chains: Combine chains with identical sequences into a single FASTA entry. Useful for symmetric oligomers where you only need one representative sequence.
Model handling
Model selection: NMR structures typically contain 10-20 conformational models representing structural uncertainty. Choose First model only for a single representative structure, or All models if you need to analyze conformational variability.
Missing and modified residues
Missing residues: Crystal structures often have disordered regions without coordinates. Choose Skip gaps to omit these positions entirely, or Insert X characters to preserve sequence numbering with placeholder X residues.
Include modified residues: When enabled, common post-translational modifications and crystallographic substitutions are converted to their parent amino acids. For example, selenomethionine (MSE) becomes methionine (M), and phosphoserine (SEP) becomes serine (S).
Output formatting
Header format: Controls the FASTA header line. PDB_Chain produces headers like >1HTM_A, while Title_Chain uses the structure title from the PDB file.
Line wrapping: Standard FASTA files wrap sequences at 60 or 80 characters per line. Use No wrapping for single-line sequences that are easier to copy-paste into other tools.
Understanding the results
The output is standard FASTA format with one or more sequences. Each sequence begins with a header line starting with >, followed by the amino acid sequence.
>1HTM_A
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVV
HSLAKWKRQQIAAALEHHHHHH
>1HTM_B
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVV
HSLAKWKRQQIAAALEHHHHHH
The extracted sequences can be used directly with sequence analysis tools like Amino Acid Composition, Protein Parameters, or structure prediction tools like ESMFold and Boltz-2.
Common use cases
Extracting sequences from experimental structures is often the first step in computational workflows. You might need the sequence to:
- Search for homologs using BLAST or similar tools
- Predict properties like isoelectric point or molecular weight
- Use as input for structure prediction to compare with the experimental structure
- Design primers for cloning or mutagenesis
Related tools
- PDB Viewer - Visualize 3D structures before conversion
- PDB Fixer - Repair structures with missing atoms or residues
- PDB to CIF - Convert to mmCIF format
- PDB to SDF - Extract ligands to SDF format
- PDB to MOL2 - Convert to MOL2 format
- Protein Parameters - Analyze extracted sequences