Protein to DNA converter

Reverse translate protein sequences into DNA with codon optimization and GC-content controls.

How to convert protein to DNA?

The easiest way to convert protein to DNA online is to use this ProteinIQ protein to DNA converter. Paste a protein sequence in FASTA format or upload a file, choose your expression host and options, then run the tool. ProteinIQ reverse translates each amino acid to a codon, optimizes codon usage for your target organism, and returns a .fasta file with the DNA sequence along with a summary of GC content and Codon Adaptation Index.

Reverse translation is not deterministic. Because the genetic code is degenerate, most amino acids map to several codons, so one protein can be encoded by a huge number of DNA sequences. This reverse translation tool chooses the codons for you using the strategy you select, so the output is one valid coding sequence rather than the single "correct" answer.

For a single protein, the result is one FASTA record. The example below uses the human expression host with the most frequent codon for each amino acid:

Text
>insulin_A_fragment
GGCATCGTGGAGCAGTGCTGCTGA

The input GIVEQCC becomes the DNA above, and a stop codon (TGA) is appended at the end because stop codons are added by default.

For several proteins, paste them as a multi-record FASTA file and each one is translated separately:

Text
>protein1
ATGAAGCTGCTGATCCTGACCTGCCTGGTG...
>protein2
GTGGCCCCCTTCCCCGAGGTGTTCGGC...

To go from amino acid to a degenerate DNA sequence for primer design, switch the output mode to degenerate. Each residue collapses to IUPAC ambiguity codes that cover every possible codon:

Text
>peptide
GGNATHGTNGARCARTGYTGY

Input

InputDescription
InputOne or more protein sequences in FASTA format, or a single raw sequence. Accepts pasted text or file uploads. Extensions: .txt, .fasta, .fa, .fas, .seq. Maximum file size: 50 MB.

Valid input uses the 20 standard one-letter amino acid codes. The converter also accepts ambiguity and extended codes (B, Z, J, X, U, O) and a terminal stop marker (*). Internal stop codons are removed before translation and reported in the summary.

Settings

SettingDescription
Output modeSingle DNA sequence (default) picks one concrete codon per residue for synthesis, cloning, and expression. Degenerate consensus collapses each amino acid to IUPAC ambiguity codes for designing degenerate PCR primers. Host optimization, restriction-site avoidance, repeat avoidance, and CAI do not apply in degenerate mode.
Genetic code tableWhich codon-to-amino-acid mapping to use. The converter ships all 18 common NCBI translation tables, including Standard (1), Vertebrate Mitochondrial (2), Yeast Mitochondrial (3), Bacterial/Plastid (11), and many invertebrate, ciliate, and other organellar codes. Use Standard for most nuclear genes.
Expression hostOptimizes codon choice for a target organism. Options: None, Human, CHO, Mouse, E. coli, Yeast, Pichia pastoris, Sf9/insect, Arabidopsis, or a Custom table. Each organism prefers different synonymous codons, so matching your host improves translation efficiency and yield.
Custom codon usage tableShown when the host is Custom. Paste a codon usage table for any organism. Accepts Kazusa database output or simple codon value pairs. Values may be fractions, per-thousand counts, or raw counts and are normalised per amino acid automatically. At least 20 recognisable codons are required.
Codon selectionHow a codon is chosen for each amino acid. Match host frequency (default) samples codons in proportion to natural usage. Most frequent codon maximizes the Codon Adaptation Index. GC-rich and AT-rich bias toward high or low GC content. Ignored when the host is None or the output is degenerate.
Reading frame offsetAdds 0, 1, or 2 N nucleotides before each sequence to shift the reading frame for downstream cloning.
Add stop codonAppends a stop codon (TAA, TAG, or TGA) to the end of each sequence. Default: enabled.
Ambiguous amino acidsHow to handle B, Z, J, X, U, and O. Use default codon maps each to a representative residue, Convert to NNN outputs NNN, and Skip residue omits it. Every substitution is reported as a warning.
Ensure start codonGuarantees an ATG start. If the protein does not begin with Methionine, an ATG is prepended and noted in the summary. Default: disabled.
Avoid restriction sitesRemoves common restriction enzyme sites (EcoRI, HindIII, BamHI, XhoI, SalI, NotI, PstI, and more) on both strands by swapping synonymous codons. Default: disabled.
Avoid long repeatsBreaks homopolymer runs (such as AAAAAA) and short tandem repeats that often cause gene synthesis to fail. Runs in the same pass as restriction-site avoidance. Default: disabled.
Output typeDNA outputs thymine (T); RNA outputs uracil (U). Default: DNA.
Line lengthCharacters per line in the FASTA output: 60, 80 (default), 100, 120, or no wrapping for a single line. Single-line output is convenient for pasting into synthesis ordering systems.

Results

OutputDescription
DNA or RNA FASTAThe reverse-translated sequence with > headers and wrapped lines. Copy it to the clipboard or download the .fasta file.
Conversion summaryA .txt report with the sequence count, output mode, genetic code, codon selection, expression host, total length, GC content, Codon Adaptation Index, and any restriction sites or repeats that could not be removed.

What is reverse translation?

Reverse translation, also called back translation, is the process of converting a protein into a DNA sequence that could encode it. Forward translation (DNA to protein) follows a deterministic one-to-one mapping, but going from protein to DNA means choosing among many possible codons.

For a protein of just 100 amino acids there can be billions of different DNA sequences that encode exactly the same protein. Reverse translation is essential for gene synthesis, codon optimization for heterologous expression, designing primers for molecular cloning, and protein engineering when you need to work backward from a known protein sequence.

The degenerate genetic code

The genetic code uses 64 three-nucleotide codons to specify 20 standard amino acids plus start and stop signals, so multiple codons encode a single amino acid.

Only methionine (M) and tryptophan (W) have single codons (ATG and TGG), while leucine, serine, and arginine each have six codon options. The variation usually falls in the third "wobble" position.

Amino acidNumber of codonsExample codons
Methionine (M)1ATG
Tryptophan (W)1TGG
Phenylalanine (F)2TTT, TTC
Leucine (L)6TTA, TTG, CTT, CTC, CTA, CTG
Serine (S)6TCT, TCC, TCA, TCG, AGT, AGC
Arginine (R)6CGT, CGC, CGA, CGG, AGA, AGG

This degeneracy buffers against mutations. A point mutation in the third position often produces a synonymous change that keeps the original amino acid, so protein structure is preserved.

Codon usage bias

Different organisms prefer different synonymous codons even though they encode the same amino acid. This preference, called codon usage bias, reflects the abundance of different tRNA molecules in each organism.

Using rare codons can slow translation or lower protein yield, while matching codons to the target organism can increase expression. The table below shows a few examples.

Amino acidHuman preferenceE. coli preferenceS. cerevisiae preference
LeucineCTG (41%)CTG (47%)TTG (29%)
SerineAGC (24%)AGC (25%)TCT (26%)
ArginineCGG (21%)CGT (36%)AGA (48%)
GlycineGGC (34%)GGC (37%)GGT (47%)

Codon choice also affects mRNA secondary structure, translation kinetics, and co-translational folding. Some proteins rely on strategically placed rare codons to pause the ribosome for correct folding, which is why matching the host distribution is often better than always using the single best codon.

How to optimize codons for expression

Codon optimization tunes a coding sequence so it expresses well in a chosen host. The workflow with this converter is:

  • Pick your real expression host, or paste a custom codon usage table if your organism is not listed.
  • Choose a codon selection strategy. Match host frequency is the recommended default because it avoids the repetitive sequences that come from always using the top codon. Most frequent codon gives the highest Codon Adaptation Index.
  • Turn on Avoid restriction sites if you are cloning into a vector, and Avoid long repeats if you plan to order the gene from a synthesis vendor.
  • Check the GC content and Codon Adaptation Index in the summary file, then adjust the strategy if needed.

The Codon Adaptation Index (CAI) measures how closely a sequence matches the host's preferred codons on a scale from 0 to 1. Highly expressed native genes usually sit between 0.7 and 1.0.

Which translation tool should I use?

Use this protein to DNA converter to reverse translate a protein, amino acid, or peptide sequence into a nucleotide sequence. Use a different tool when you are working in the other direction or with a different input.

GoalBest toolUse when
Protein to DNA (reverse translation)Protein to DNAYou have an amino acid sequence and need a coding DNA sequence.
DNA to protein (forward translation)DNA to proteinYou have a nucleotide sequence and need the protein it encodes.
DNA to RNA (transcription)DNA to RNAYou need the RNA transcript of a DNA sequence.
Reverse complement of a sequenceReverse complementYou need the complementary strand of a DNA sequence.
Clean raw text into FASTATXT to FASTAYour sequence has numbering, spaces, or messy formatting.

FAQ

Can you go from protein to DNA?

Yes, but not uniquely. Because most amino acids are encoded by several codons, a single protein can be encoded by an astronomical number of DNA sequences, typically 103010^{30} to 106010^{60} for a 100-residue protein. A reverse translation tool selects one valid coding sequence, and codon optimization picks codons that match your target organism so the gene is more likely to express well.

How do I convert a protein sequence to a DNA sequence?

Paste the protein sequence in FASTA format or upload a file, choose an expression host and codon selection strategy, then run the converter. The tool reverse translates each amino acid to a codon and returns a DNA .fasta file plus a summary with GC content and Codon Adaptation Index.

How do I convert an amino acid sequence to a nucleotide sequence?

It is the same process as converting protein to DNA. Each amino acid is mapped to a codon, so an amino acid or peptide sequence becomes a nucleotide sequence. Use the degenerate output mode if you want an amino acid to nucleotide consensus for primer design rather than a single sequence.

Is reverse translation accurate?

The reverse-translated DNA always encodes exactly the protein you entered, so it is 100% accurate at the amino acid level. Expression levels vary: codon-optimized sequences often express several times better than random codon choice, but results depend on the protein, host, promoter, and mRNA stability. The generated DNA will almost never match the natural gene, so use a database like NCBI or UniProt if you need the real genomic sequence.

How do I design degenerate PCR primers from a protein?

Switch the output mode to degenerate consensus. The tool returns IUPAC ambiguity codes that cover every codon capable of encoding your protein. Take a short, low-degeneracy stretch (regions rich in Met and Trp are ideal because they are unambiguous) and order it as a degenerate primer to amplify the gene from genomic or cDNA template.

Can I reverse translate protein to RNA?

Yes. Set the output type to RNA and the converter outputs the sequence with uracil (U) instead of thymine (T), which is useful for mRNA and in vitro transcription workflows.

What genetic codes are supported?

All 18 common NCBI translation tables, including the standard code, vertebrate and invertebrate mitochondrial codes, yeast mitochondrial, bacterial and plastid, ciliate nuclear, and several other organellar codes. Use the standard code for nuclear genes and pick an organellar code only when working with mitochondrial or plastid sequences.