ProteinIQ
FastTree example image

FastTree

Build phylogenetic trees from aligned protein or nucleotide sequences using approximate maximum-likelihood methods.

What is FastTree?

FastTree builds approximately-maximum-likelihood phylogenetic trees from multiple sequence alignments. It can handle alignments with up to a million sequences while remaining computationally efficient—100 to 1,000 times faster than traditional maximum-likelihood methods like PhyML or RAxML.

Phylogenetic trees visualize evolutionary relationships between sequences. Branch lengths represent evolutionary distance (substitutions per site), while the tree topology shows which sequences share common ancestors. FastTree is particularly useful for large-scale studies where traditional ML methods would be prohibitively slow.

Important: FastTree requires pre-aligned sequences (a Multiple Sequence Alignment). All sequences must be the same length with homologous positions aligned in the same column. If your sequences aren't aligned, you'll need to align them first using tools like Clustal Omega or MUSCLE before using FastTree.

How does FastTree work?

FastTree uses a three-phase approach to build phylogenetic trees efficiently.

Phase 1: Initial tree construction

FastTree first builds a rough starting tree using a heuristic variant of neighbor-joining. Instead of storing a full distance matrix (which would require O(N2)O(N^2) memory), FastTree stores sequence profiles at internal nodes. This optimization allows it to handle much larger alignments than traditional methods.

Three heuristics speed up this phase: remembering the best join for each node, hill-climbing search for better joins, and the "top-hits" heuristic to avoid computing all pairwise distances.

Phase 2: Minimum evolution optimization

The initial tree is improved using nearest-neighbor interchanges (NNI) and subtree-pruning-regrafting (SPR) moves under the minimum evolution criterion. FastTree performs 4×log2(N)4 \times \log_2(N) rounds of NNIs and 2 rounds of SPRs by default.

SPR moves are computationally expensive (O(N2)O(N^2) possibilities), so FastTree treats them as chains of NNIs and only extends promising candidates.

Phase 3: Maximum likelihood refinement

Finally, FastTree performs maximum-likelihood NNI moves to optimize the tree topology. The CAT approximation assigns each alignment site to one of 20 rate categories, accounting for varying evolutionary rates across positions without the computational cost of full gamma-distributed rates.

Evolutionary models

FastTree supports different substitution models depending on your sequence type.

Protein models

  • JTT (Jones-Taylor-Thornton): Default model, suitable for most protein alignments
  • WAG (Whelan & Goldman): Alternative empirical model with different amino acid exchangeabilities
  • LG (Le-Gascuel): More recent model, often performs well on diverse protein families

Nucleotide models

  • Jukes-Cantor: Simple model assuming equal substitution rates between all nucleotides
  • GTR (Generalized Time-Reversible): More complex model with different rates for each substitution type; we recommend enabling this for nucleotide sequences

Input requirements

FastTree accepts sequences in FASTA format. All sequences must be aligned (same length) before submission.

Sequence names cannot contain the characters : , ( ) as these have special meaning in Newick tree format.

Sequence type

FastTree auto-detects whether your alignment contains protein or nucleotide sequences. You can override this if needed—select Nucleotide for DNA/RNA sequences or Protein for amino acid sequences.

Settings

Protein model

Choose the amino acid substitution matrix. JTT works well for most cases. Try WAG or LG if you're working with specific protein families where these models have been shown to perform better.

Use GTR model

For nucleotide sequences, the GTR model accounts for different substitution rates (e.g., transitions vs transversions). We recommend enabling this for DNA/RNA alignments.

Gamma optimization

Rescales branch lengths using the Gamma20 likelihood, which more accurately models rate variation across sites. This adds approximately 5% to computation time but provides more accurate branch length estimates.

Fast mode

Speeds up neighbor-joining approximately 4-fold. We recommend enabling this for alignments with more than 50,000 sequences.

Bootstrap resamples

By default, FastTree reports SH-like local support values. Setting bootstrap resamples > 0 performs traditional bootstrap analysis instead, which resamples alignment columns and rebuilds trees to assess branch support.

Understanding the results

FastTree outputs a phylogenetic tree in Newick format, which our interactive viewer renders automatically.

Branch lengths

Branch lengths represent evolutionary distance in substitutions per site. Longer branches indicate more sequence divergence. Closely related sequences (like proteins from the same species) will have short branches between them.

Support values

Numbers at internal nodes indicate statistical support for that split in the tree. FastTree reports SH-like local supports ranging from 0 to 1 by default. Values above 0.9 indicate well-supported branches; values below 0.7 suggest uncertainty in that part of the tree.

Tree topology

The branching pattern shows inferred evolutionary relationships. Sequences that cluster together share a more recent common ancestor than sequences on distant branches.

Limitations

FastTree prioritizes speed over maximum accuracy. For datasets where topological accuracy is critical, consider using slower but more thorough methods like IQ-TREE or RAxML for final analysis.

The CAT approximation, while fast, is less accurate than full discrete gamma models for estimating rate variation. If you need precise branch lengths, enable Gamma optimization.

After building your tree, you might want to analyze the individual sequences:

For structure-based analysis of proteins in your tree:

  • ESMFold — Predict structures for sequences of interest
  • Boltz-2 — Generate structure predictions with ligand support

Based on: Price MN, Dehal PS, Arkin AP. FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE 5(3): e9490. doi:10.1371/journal.pone.0009490