What is PDB to FASTA converter?
PDB to FASTA converter extracts protein and nucleic acid sequences from structure files. PDB files contain 3D atomic coordinates, but many bioinformatics tools require only the sequence in FASTA format. This tool reads the structural data and outputs clean, properly formatted sequences.
The converter handles multi-chain complexes, DNA and RNA chains, and structures with missing or modified residues. You can fetch structures directly from the RCSB Protein Data Bank using a 4-character PDB ID, or upload your own files.
For visualizing PDB structures before conversion, use the PDB Viewer. If your structure has issues like missing atoms or non-standard residues, the PDB Fixer can clean it up first.
How to convert PDB to FASTA?
PDB files store amino acid information in ATOM and HETATM records. Each record contains a three-letter residue code (like ALA for alanine, GLY for glycine) along with the chain identifier and residue number.
The conversion reads these records in sequence order, extracts the residue codes for each chain, and maps the three-letter codes to one-letter amino acid codes using the IUPAC convention.
By default the tool reports the observed sequence, meaning only residues that have 3D coordinates. When a structure has disordered loops or unresolved termini, those residues are absent from the coordinates. To get the complete sequence that was present in the sample, switch the sequence source to the deposited SEQRES records.
The conversion follows the standard amino acid abbreviations:
| Three-letter | One-letter | Amino acid |
|---|---|---|
| ALA | A | Alanine |
| CYS | C | Cysteine |
| ASP | D | Aspartic acid |
| GLU | E | Glutamic acid |
| PHE | F | Phenylalanine |
| GLY | G | Glycine |
| HIS | H | Histidine |
| ILE | I | Isoleucine |
| LYS | K | Lysine |
| LEU | L | Leucine |
| MET | M | Methionine |
| ASN | N | Asparagine |
| PRO | P | Proline |
| GLN | Q | Glutamine |
| ARG | R | Arginine |
| SER | S | Serine |
| THR | T | Threonine |
| VAL | V | Valine |
| TRP | W | Tryptophan |
| TYR | Y | Tyrosine |
Inputs and settings
Sequence
| Setting | What it does |
|---|---|
| Sequence source | Observed (from coordinates) returns only residues with atomic coordinates, the sequence you can see in the structure. Full deposited (SEQRES) returns the complete SEQRES sequence, including residues too disordered to resolve. The two differ at flexible loops and chain termini. Files without SEQRES records fall back to the observed sequence and add a warning. |
| Molecule type | Extracts protein chains, nucleic acid chains (DNA/RNA), or both. The default extracts every polymer chain. |
Chain selection
| Setting | What it does |
|---|---|
| Chain selection | All chains for complete extraction, First chain only for simple monomers, or Specific chains to target particular chain IDs. |
| Chain IDs | Comma-separated identifiers used with Specific chains, for example A,B,C. Chain IDs in PDB files are single characters, typically letters. |
Chain filtering
Chain filtering is off by default, so every chain is kept. Turn it on to refine your output when working with large complexes or structures with many small peptide fragments.
| Setting | What it does |
|---|---|
| Minimum chain length | Excludes short chains such as tags or crystallization additives. A value of 20 to 30 residues isolates the protein of interest. |
| Maximum chain length | Excludes unusually long chains, useful when isolating small binding peptides from complexes. |
| Merge identical chains | Combines chains with identical sequences into a single FASTA entry, for symmetric oligomers where one representative sequence is enough. |
Missing and modified residues
| Setting | What it does |
|---|---|
| Missing residues | Skip gaps omits unresolved positions. Insert X characters adds placeholder X where residue numbering breaks, preserving numbering. Applies to the observed source only, since SEQRES already holds the full sequence. Inferred gaps do not detect residues missing from the chain ends. |
| Include modified residues | On by default. Converts modified residues to their parent amino acid, such as selenomethionine (MSE) to M and phosphoserine (SEP) to S. Turning it off drops these residues and leaves gaps. |
Output formatting
| Setting | What it does |
|---|---|
| Header format | PDB_Chain produces headers like >1HTM_A. Title_Chain uses the structure title from the PDB file. Chain ID only uses just the chain identifier. |
| Line wrapping | Wraps sequences at 60, 80, or 100 characters per line. No wrapping produces single-line sequences that are easier to copy-paste into other tools. |
Understanding the results
The output is standard FASTA format with one or more sequences. Each sequence begins with a header line starting with >, followed by the amino acid sequence.
>1HTM_A
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVV
HSLAKWKRQQIAAALEHHHHHH
>1HTM_B
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVV
HSLAKWKRQQIAAALEHHHHHHThe extracted sequences can be used directly with sequence analysis tools like Amino Acid Composition, Protein Parameters, or structure prediction tools like ESMFold and Boltz-2.
Common use cases
Extracting sequences from experimental structures is often the first step in computational workflows. You might need the sequence to:
- Search for homologs using BLAST or similar tools
- Predict properties like isoelectric point or molecular weight
- Use as input for structure prediction to compare with the experimental structure
- Design primers for cloning or mutagenesis
Related tools

CSV to FASTA
Convert CSV and TSV files containing sequence data to FASTA format with flexible column mapping and automatic delimiter detection

One-to-Three Converter
Convert single-letter amino acid codes to three-letter codes

PDB to CIF Converter
Convert Protein Data Bank files to Crystallographic Information File format

PDB to MOL2 Converter
Convert Protein Data Bank files to MOL2 molecular format

Three-to-one converter
Convert three-letter amino acid codes to single-letter codes

TXT to FASTA converter
Convert TXT or plain text sequences into FASTA format files for DNA, RNA, and protein workflows with cleanup, validation, and downloads

DNA to Protein Converter
Translate DNA sequences to protein sequences using genetic code

Protein to DNA converter
Reverse translate protein sequences to possible DNA sequences

GenBank Feature Extractor
Extract sequence features (CDS, mRNA, gene, etc.) from GenBank files in FASTA format with support for spliced features

FASTA to FASTQ Converter
Convert FASTA sequence files to FASTQ format with mock quality scores
