
Protein to DNA converter
Reverse translate protein sequences to possible DNA sequences. Upload a FASTA file or paste your protein sequences below.
Reverse translation is the process of converting proteins into DNA sequences that could encode them. Unlike forward translation (DNA to protein), which follows a deterministic one-to-one mapping, reverse translation involves choosing from multiple possible DNA sequences because the genetic code is degenerate.
This degeneracy means most amino acids can be encoded by more than one codon. For a protein of just 100 amino acids, there could be billions of different DNA sequences that encode exactly the same protein sequence.
Reverse translation is essential for gene synthesis, codon optimization for heterologous expression, designing primers for molecular cloning, and protein engineering applications where you need to work backward from a known protein sequence.
The genetic code uses 64 three-nucleotide codons to specify only 20 standard amino acids plus start and stop signals. This creates a redundancy, where multiple codons encode a single amino acid.
Only methionine (M) and tryptophan (W) have single codons (ATG and TGG), while leucine, serine, and arginine each have six different codon options. The variation typically occurs in the third "wobble" position of the codon.
| Amino acid | Number of codons | Example codons |
|---|---|---|
| Methionine (M) | 1 | ATG |
| Tryptophan (W) | 1 | TGG |
| Phenylalanine (F) | 2 | TTT, TTC |
| Leucine (L) | 6 | TTA, TTG, CTT, CTC, CTA, CTG |
| Serine (S) | 6 | TCT, TCC, TCA, TCG, AGT, AGC |
| Arginine (R) | 6 | CGT, CGC, CGA, CGG, AGA, AGG |
This degeneracy provides evolutionary buffering against mutations. A point mutation in the third position often produces a synonymous mutation that maintains the original amino acid, preventing changes to protein structure.
Different organisms prefer different synonymous codons, even though they encode the same amino acid. This preference, called codon usage bias, reflects the abundance of different tRNA molecules in each organism's cells.
Using rare codons can slow translation or reduce protein yield. Conversely, optimizing codons to match the target organism's preferences can dramatically increase expression levels.
| Amino acid | Human preference | E. coli preference | S. cerevisiae preference |
|---|---|---|---|
| Leucine | CTG (40%) | CTG (51%) | UUG (28%) |
| Serine | AGC (24%) | UCU (39%) | UCU (23%) |
| Arginine | CGC (28%) | CGU (40%) | AGA (48%) |
| Glycine | GGC (35%) | GGU (37%) | GGU (48%) |
Codon choice also affects mRNA secondary structure, translation kinetics, and co-translational protein folding. Some proteins require strategic placement of rare codons to induce ribosomal pausing for proper folding.
This setting determines which codon-to-amino-acid mapping to use. Different organisms (especially mitochondria) have slightly different genetic codes where the same codon can encode different amino acids:
For most applications involving nuclear genes, use Standard. Only select mitochondrial codes when working with organellar sequences.
This setting optimizes codon selection for your target expression system. Each organism has different tRNA abundances, so matching codons to your host improves translation efficiency:
We recommend selecting your actual expression host. The codon frequency tables are derived from empirical genome-wide codon usage data.
Avoid restriction sites: When enabled, the converter actively removes common restriction enzyme recognition sites (EcoRI, HindIII, BamHI, XhoI, SalI, NotI, PstI, and others) by swapping to alternative synonymous codons. This is essential when cloning into vectors that use these enzymes.
The algorithm iteratively scans the generated sequence and substitutes codons until all targeted restriction sites are eliminated—without changing the encoded protein.
Output type: Choose between DNA (with thymine) or RNA (with uracil) output format.
Line length: Control FASTA sequence wrapping. Options include 60, 80 (standard), 100, 120 characters, or No wrapping for single-line output. Single-line format is useful for direct input into synthesis ordering systems.
Reading frame offset: Add N nucleotides before the sequence to shift the reading frame for downstream cloning applications.
Stop codon control: Add stop codons (TAA, TAG, or TGA) at sequence termination.
Start codon verification: Ensure sequences begin with ATG when the protein starts with methionine.
Ambiguous amino acid handling: Configure how to treat ambiguous residues:
Alternatively, convert ambiguous residues to NNN or skip them entirely.
Yes, but not deterministically. Because multiple codons encode most amino acids (genetic code degeneracy), a single protein sequence can be encoded by astronomically many different DNA sequences. For a 100-amino-acid protein, there are typically to possible DNA sequences.
Reverse translation tools like this one select one valid DNA sequence from these possibilities. When you enable codon optimization, the tool picks codons that match your target organism's preferences, improving the chances of successful expression.
The reverse-translated DNA will always encode exactly the same protein sequence—this is guaranteed by the genetic code. However, "accuracy" in a practical sense depends on your goal:
For the amino acid sequence: 100% accurate. The DNA will translate back to your original protein.
For expression levels: Variable. Codon-optimized sequences typically express 2-10× better than random codon selection, but results depend on the specific protein, expression system, and other factors (promoter strength, mRNA stability, etc.).
For matching native sequences: The generated DNA will almost never match the natural gene sequence. If you need the actual genomic sequence, use databases like NCBI or UniProt instead of reverse translation.
For the forward direction, use DNA to Protein to translate coding sequences. To analyze your protein before synthesis, Protein Parameters calculates molecular weight, pI, extinction coefficient, and other properties.
When working with sequence formats, Three to One and One to Three convert between amino acid code formats.