
PDB to FASTA converter
Convert PDB to FASTA online. Upload a PDB structure file or fetch from RCSB to convert it to FASTA sequence format.
PDB to FASTA Converter extracts amino acid sequences from protein structure files. PDB files contain 3D atomic coordinates, but many bioinformatics tools require only the sequence in FASTA format. This tool bridges that gap by reading the structural data and outputting clean, properly formatted sequences.
The converter handles multi-chain complexes, NMR ensembles with multiple models, and structures with missing or modified residues. You can fetch structures directly from the RCSB Protein Data Bank using a 4-character PDB ID, or upload your own files.
For visualizing PDB structures before conversion, use the PDB Viewer. If your structure has issues like missing atoms or non-standard residues, the PDB Fixer can clean it up first.
PDB files store amino acid information in ATOM and HETATM records. Each record contains a three-letter residue code (like ALA for alanine, GLY for glycine) along with the chain identifier and residue number.
The conversion starts by rading these records in sequence order, extracting the residue codes for each chain. It then maps the three-letter codes to one-letter amino acid codes using the IUPAC convention.
The conversion follows the standard amino acid abbreviations:
| Three-letter | One-letter | Amino acid |
|---|---|---|
| ALA | A | Alanine |
| CYS | C | Cysteine |
| ASP | D | Aspartic acid |
| GLU | E | Glutamic acid |
| PHE | F | Phenylalanine |
| GLY | G | Glycine |
| HIS | H | Histidine |
| ILE | I | Isoleucine |
| LYS | K | Lysine |
| LEU | L | Leucine |
| MET | M | Methionine |
| ASN | N | Asparagine |
| PRO | P | Proline |
| GLN | Q | Glutamine |
| ARG | R | Arginine |
| SER | S | Serine |
| THR | T | Threonine |
| VAL | V | Valine |
| TRP | W | Tryptophan |
| TYR | Y | Tyrosine |
Chain selection: Choose which chains to extract from multi-chain structures. Use All chains for complete extraction, First chain only for simple monomers, or Specific chains to target particular chain IDs.
Chain IDs: When using Specific chains, enter comma-separated chain identifiers (e.g., A,B,C). Chain IDs in PDB files are single characters, typically letters.
Enable chain filtering to refine your output when working with large complexes or structures with many small peptide fragments.
Minimum chain length: Exclude short chains that may be artifacts, tags, or crystallization additives. We recommend setting this to 20-30 residues when extracting only the protein of interest.
Maximum chain length: Exclude unusually long chains if needed. This is rarely used but helpful when isolating small binding peptides from complexes.
Merge identical chains: Combine chains with identical sequences into a single FASTA entry. Useful for symmetric oligomers where you only need one representative sequence.
Model selection: NMR structures typically contain 10-20 conformational models representing structural uncertainty. Choose First model only for a single representative structure, or All models if you need to analyze conformational variability.
Missing residues: Crystal structures often have disordered regions without coordinates. Choose Skip gaps to omit these positions entirely, or Insert X characters to preserve sequence numbering with placeholder X residues.
Include modified residues: When enabled, common post-translational modifications and crystallographic substitutions are converted to their parent amino acids. For example, selenomethionine (MSE) becomes methionine (M), and phosphoserine (SEP) becomes serine (S).
Header format: Controls the FASTA header line. PDB_Chain produces headers like >1HTM_A, while Title_Chain uses the structure title from the PDB file.
Line wrapping: Standard FASTA files wrap sequences at 60 or 80 characters per line. Use No wrapping for single-line sequences that are easier to copy-paste into other tools.
The output is standard FASTA format with one or more sequences. Each sequence begins with a header line starting with >, followed by the amino acid sequence.
1>1HTM_A2MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVV3HSLAKWKRQQIAAALEHHHHHH4>1HTM_B5MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVV6HSLAKWKRQQIAAALEHHHHHHThe extracted sequences can be used directly with sequence analysis tools like Amino Acid Composition, Protein Parameters, or structure prediction tools like ESMFold and Boltz-2.
Extracting sequences from experimental structures is often the first step in computational workflows. You might need the sequence to: