ProteinIQ

ORF Finder

Identify all potential protein-coding regions (Open Reading Frames) in your DNA sequences across all six reading frames.

What is ORF Finder?

An open reading frame (ORF) is a stretch of DNA that begins with a start codon and ends with a stop codon, representing a potential protein-coding region. ORF Finder scans DNA sequences to identify all such regions across all six reading frames—three on the forward strand and three on the reverse complement.

Because DNA is double-stranded and codons are read in triplets, any sequence can be read in six different ways. A start codon (typically ATG) signals where translation might begin, while stop codons (TAA, TAG, or TGA in the standard genetic code) mark termination. ORF Finder systematically locates these boundaries and reports the predicted protein translations.

Finding ORFs is often the first step in gene annotation for newly sequenced DNA, though the presence of an ORF doesn't guarantee it encodes a functional protein—additional evidence like expression data or sequence conservation is usually needed to confirm coding potential.

How to use ORF Finder online

ProteinIQ's ORF Finder runs entirely in the browser, processing sequences instantly without uploading data to a server.

Input

InputDescription
DNA SequenceOne or more DNA sequences in FASTA format or raw nucleotide text. Supports files up to 50 MB.

Settings

Search parameters

SettingDescription
Minimum ORF lengthFilter out short ORFs. Options: 75 nt (25 aa), 150 nt (50 aa), 225 nt (75 aa), 300 nt (100 aa). Default is 75 nt.
Genetic codeNCBI translation table for codon interpretation. 25 codes available, from standard eukaryotic to various mitochondrial and bacterial variants.
Start codon modeWhich codons initiate translation: ATG only (canonical), ATG + Alternative (includes TTG, CTG, GTG), or Any sense codon (for finding all potential reading frames).
StrandSearch Both strands, Forward only (+), or Reverse only (-).
Ignore nested ORFsWhen enabled, suppresses smaller ORFs that fall entirely within a larger ORF on the same reading frame.

Output

Results display in an interactive table with columns for:

ColumnDescription
ORF IDIdentifier combining sequence name, strand, frame, and ORF number
Strand+ (forward) or - (reverse complement)
FrameReading frame (1, 2, or 3)
StartNucleotide position where the ORF begins (1-based)
StopNucleotide position where the ORF ends
Length (nt)ORF length in nucleotides
Length (aa)Predicted protein length in amino acids
ProteinTranslated amino acid sequence

Results can be exported as CSV, JSON, or FASTA (protein sequences).

Genetic codes

Different organisms use different codon-to-amino-acid mappings. The standard code applies to most nuclear genes in eukaryotes and many prokaryotes, but mitochondria, plastids, and certain protists have reassigned codons.

Common genetic codes:

CodeNameKey differences
1StandardTAA, TAG, TGA are stop codons
2Vertebrate MitochondrialTGA→Trp, AGA/AGG→Stop
11Bacterial/PlastidSame as standard but with more alternative start codons
4Mold/Protozoan MitochondrialTGA→Trp
6Ciliate NuclearTAA/TAG→Gln

Select the appropriate code for the organism being analyzed—using the wrong table will produce incorrect translations.

Interpreting results

Not every ORF encodes a real protein. Consider these factors:

  • Length: Longer ORFs are more likely to be genuine. Random sequence produces a stop codon roughly every 21 codons on average, so ORFs under 100 codons may arise by chance.
  • Context: True genes often have regulatory elements upstream (promoters, ribosome binding sites) not detected by ORF scanning alone.
  • Conservation: ORFs shared across related species are more likely functional.
  • Codon bias: Coding regions often show non-random codon usage characteristic of the organism.

For prokaryotic sequences, ORFs closely correspond to coding sequences (CDS). Eukaryotic genes with introns require splice-aware prediction methods—ORF Finder operates on continuous sequences and won't identify genes interrupted by introns.