PDB to FASTA converter

Convert PDB to FASTA online. Upload a PDB structure file or fetch from RCSB to convert it to FASTA sequence format.

Input

PDB file

Click or drag files to upload (.pdb, .ent)

Output

Configure input settings, then click "Convert"

What is PDB to FASTA converter?

PDB to FASTA Converter extracts amino acid sequences from protein structure files. PDB files contain 3D atomic coordinates, but many bioinformatics tools require only the sequence in FASTA format. This tool bridges that gap by reading the structural data and outputting clean, properly formatted sequences.

The converter handles multi-chain complexes, NMR ensembles with multiple models, and structures with missing or modified residues. You can fetch structures directly from the RCSB Protein Data Bank using a 4-character PDB ID, or upload your own files.

For visualizing PDB structures before conversion, use the PDB Viewer. If your structure has issues like missing atoms or non-standard residues, the PDB Fixer can clean it up first.

How to convert PDB to FASTA?

PDB files store amino acid information in ATOM and HETATM records. Each record contains a three-letter residue code (like ALA for alanine, GLY for glycine) along with the chain identifier and residue number.

The conversion starts by rading these records in sequence order, extracting the residue codes for each chain. It then maps the three-letter codes to one-letter amino acid codes using the IUPAC convention.

The conversion follows the standard amino acid abbreviations:

Three-letter	One-letter	Amino acid
ALA	A	Alanine
CYS	C	Cysteine
ASP	D	Aspartic acid
GLU	E	Glutamic acid
PHE	F	Phenylalanine
GLY	G	Glycine
HIS	H	Histidine
ILE	I	Isoleucine
LYS	K	Lysine
LEU	L	Leucine
MET	M	Methionine
ASN	N	Asparagine
PRO	P	Proline
GLN	Q	Glutamine
ARG	R	Arginine
SER	S	Serine
THR	T	Threonine
VAL	V	Valine
TRP	W	Tryptophan
TYR	Y	Tyrosine

Inputs and settings

Chain selection

Chain selection: Choose which chains to extract from multi-chain structures. Use All chains for complete extraction, First chain only for simple monomers, or Specific chains to target particular chain IDs.

Chain IDs: When using Specific chains, enter comma-separated chain identifiers (e.g., A,B,C). Chain IDs in PDB files are single characters, typically letters.

Chain filtering

Enable chain filtering to refine your output when working with large complexes or structures with many small peptide fragments.

Minimum chain length: Exclude short chains that may be artifacts, tags, or crystallization additives. We recommend setting this to 20-30 residues when extracting only the protein of interest.

Maximum chain length: Exclude unusually long chains if needed. This is rarely used but helpful when isolating small binding peptides from complexes.

Merge identical chains: Combine chains with identical sequences into a single FASTA entry. Useful for symmetric oligomers where you only need one representative sequence.

Model handling

Model selection: NMR structures typically contain 10-20 conformational models representing structural uncertainty. Choose First model only for a single representative structure, or All models if you need to analyze conformational variability.

Missing and modified residues

Missing residues: Crystal structures often have disordered regions without coordinates. Choose Skip gaps to omit these positions entirely, or Insert X characters to preserve sequence numbering with placeholder X residues.

Include modified residues: When enabled, common post-translational modifications and crystallographic substitutions are converted to their parent amino acids. For example, selenomethionine (MSE) becomes methionine (M), and phosphoserine (SEP) becomes serine (S).

Output formatting

Header format: Controls the FASTA header line. PDB_Chain produces headers like >1HTM_A, while Title_Chain uses the structure title from the PDB file.

Line wrapping: Standard FASTA files wrap sequences at 60 or 80 characters per line. Use No wrapping for single-line sequences that are easier to copy-paste into other tools.

Understanding the results

The output is standard FASTA format with one or more sequences. Each sequence begins with a header line starting with >, followed by the amino acid sequence.

1>1HTM_A2MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVV3HSLAKWKRQQIAAALEHHHHHH4>1HTM_B5MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVV6HSLAKWKRQQIAAALEHHHHHH

The extracted sequences can be used directly with sequence analysis tools like Amino Acid Composition, Protein Parameters, or structure prediction tools like ESMFold and Boltz-2.

Common use cases

Extracting sequences from experimental structures is often the first step in computational workflows. You might need the sequence to:

Search for homologs using BLAST or similar tools
Predict properties like isoelectric point or molecular weight
Use as input for structure prediction to compare with the experimental structure
Design primers for cloning or mutagenesis

PDB Viewer - Visualize 3D structures before conversion
PDB Fixer - Repair structures with missing atoms or residues
PDB to CIF - Convert to mmCIF format
PDB to SDF - Extract ligands to SDF format
PDB to MOL2 - Convert to MOL2 format
Protein Parameters - Analyze extracted sequences

PDB to FASTA converter

Input

Chain settings

Output formatting

Output

What is PDB to FASTA converter?

How to convert PDB to FASTA?

Inputs and settings

Chain selection

Chain filtering

Model handling

Missing and modified residues

Output formatting

Understanding the results

Common use cases

Related tools

Input

Chain settings

Output formatting

Output