ProteinIQ
IQ-TREE example image

IQ-TREE

Build maximum likelihood phylogenetic trees with automatic model selection and ultrafast bootstrap.

What is IQ-TREE?

IQ-TREE is a phylogenetic tree inference software that uses maximum likelihood methods to reconstruct evolutionary relationships from aligned sequences. It combines fast tree search algorithms with sophisticated model selection to produce publication-quality phylogenetic trees for both protein and nucleotide sequences.

The software addresses two major bottlenecks in phylogenetic analysis: selecting the best substitution model and assessing branch support. ModelFinder automatically identifies the optimal evolutionary model 10-100 times faster than traditional tools like jModelTest. The ultrafast bootstrap (UFBoot) provides branch support values 10-40 times faster than standard bootstrap methods while maintaining statistical rigor.

IQ-TREE works with aligned sequences. If your sequences are unaligned, use MAFFT or Clustal Omega first to create a multiple sequence alignment. For faster approximate trees, consider FastTree.

How does IQ-TREE work?

Maximum likelihood estimation

Maximum likelihood phylogenetics finds the tree topology and branch lengths that maximize the probability of observing your sequence data under a given evolutionary model. For each tree, IQ-TREE calculates the likelihood LL that the observed alignment arose from that tree:

lnL=i=1nlnP(xiT,θ)\ln L = \sum_{i=1}^{n} \ln P(x_i | T, \theta)

where xix_i is the pattern at site ii, TT is the tree topology, and θ\theta represents model parameters like substitution rates and branch lengths. IQ-TREE uses a stochastic perturbation algorithm that efficiently explores tree space by making strategic rearrangements guided by likelihood improvements.

ModelFinder

ModelFinder tests a comprehensive set of substitution models and selects the best one using information criteria. For each model MM, it calculates:

BIC=2lnL+klnn\text{BIC} = -2 \ln L + k \ln n

where kk is the number of free parameters and nn is the alignment length. BIC penalizes complex models more heavily than AIC, reducing the risk of overfitting. ModelFinder supports 546 protein models and 286 DNA models, including rate heterogeneity options like gamma-distributed rates and FreeRate categories.

The automatic selection saves you from manually testing dozens of models. ModelFinder identifies not just the substitution matrix (like JTT or GTR) but also the optimal rate heterogeneity model and invariant site proportion.

Ultrafast bootstrap

Traditional bootstrap resampling requires rebuilding thousands of trees from resampled alignments, which is computationally expensive. UFBoot approximates the bootstrap distribution using a more efficient approach: it performs maximum likelihood tree search on fewer replicates while estimating bootstrap proportions from intermediate tree topologies encountered during the search.

We recommend 1000 bootstrap replicates for publication-quality results. Values ≥95% indicate strong support, 70-95% moderate support, and <70% weak support. UFBoot provides approximately unbiased estimates comparable to standard bootstrap but finishes in a fraction of the time.

Branch support alternatives

The SH-aLRT (Shimodaira-Hasegawa approximate likelihood ratio test) offers an alternative to bootstrap. It tests whether the likelihood of the best tree is significantly better than the second-best tree for each branch. SH-aLRT runs even faster than UFBoot and uses a different statistical framework, so using both can provide complementary confidence measures.

Input requirements

Your input must be a pre-aligned multiple sequence alignment with at least 3 sequences. IQ-TREE will fail if you provide unaligned sequences because maximum likelihood requires site-by-site comparisons across all sequences.

Supported formats include FASTA, PHYLIP, NEXUS, and CLUSTAL. FASTA is the most common choice for its simplicity.

Substitution model

  • Auto (ModelFinder): Automatically selects the best model using BIC. This is the recommended option for most analyses as it tests all available models and chooses the optimal one.
  • ModelFinder Plus: Extended model selection that includes FreeRate models and other advanced options. Slower but more thorough than standard ModelFinder.
  • JTT, WAG, LG: Protein-specific substitution matrices. JTT works well for general protein data, WAG for nuclear proteins, and LG for diverse protein families.
  • GTR: General time reversible model for nucleotides. The most parameter-rich DNA model with six substitution rates and four base frequencies.
  • HKY, K2P: Simpler nucleotide models. HKY distinguishes transitions from transversions with unequal base frequencies. K2P assumes equal base frequencies.

Bootstrap options

  • Ultrafast bootstrap replicates: Number of UFBoot replicates to perform. We recommend 1000 for publication work, though 100-500 can give preliminary results faster. Set to 0 to skip bootstrap analysis entirely.
  • SH-aLRT replicates: Number of SH-aLRT tests. Common values are 1000 or 0 (disabled). When both UFBoot and SH-aLRT are enabled, you'll get two support values per branch, which some prefer for conservative interpretation.

Understanding the results

Tree visualization

IQ-TREE outputs a phylogenetic tree in Newick format, which displays branch relationships and support values. Branch lengths represent evolutionary distance—longer branches indicate more substitutions per site.

Support values appear at internal nodes. With UFBoot enabled, these represent the percentage of bootstrap replicates supporting that clade. Values ≥95% are considered strong evidence, 70-95% moderate, and <70% weak. Branches with weak support indicate uncertainty in the tree topology at those positions.

Model information

The results include the selected substitution model and its parameters. For example, GTR+F+I+G4 indicates:

  • GTR: General time reversible substitution matrix
  • +F: Empirical base/amino acid frequencies
  • +I: Proportion of invariant sites
  • +G4: Gamma-distributed rate heterogeneity with 4 categories

Statistical measures

  • Log-likelihood (lnL): The natural logarithm of the tree likelihood. More negative values indicate worse fit. You cannot compare lnL across different alignments, only across trees for the same data.
  • AIC/BIC: Akaike and Bayesian information criteria. Lower values indicate better model fit after penalizing for model complexity. BIC penalizes complexity more heavily than AIC.
  • Tree length: Sum of all branch lengths, representing total evolutionary change. Longer trees suggest more divergence among sequences.

When to use IQ-TREE

IQ-TREE excels when you need rigorous model selection and publication-quality results. The automatic ModelFinder testing ensures your tree is based on the most appropriate evolutionary model for your data.

For large alignments where speed matters more than model sophistication, FastTree provides approximate maximum likelihood trees much faster. FastTree uses simpler models but can handle datasets with tens of thousands of sequences.

For phylogenomic analyses with hundreds of genes, IQ-TREE's partition models can assign different evolutionary models to different alignment regions. This is important when combining genes with different evolutionary rates.

Best practices

Start with ModelFinder (Auto) unless you have strong prior knowledge about which model fits your data. Manual model selection rarely outperforms ModelFinder's systematic approach.

Run at least 1000 bootstrap replicates for publication. Preliminary analyses can use 100-200 to save time, but final trees should use ≥1000 for reliable support estimates.

Check for outlier sequences before running IQ-TREE. Sequences with excessive gaps or unusual composition can distort tree topology. If bootstrap support is uniformly low, your alignment may be too divergent or contain paralogs rather than orthologs.

The alignment quality matters more than the phylogenetic method. Poorly aligned regions introduce noise that no tree-building algorithm can overcome. Consider using MAFFT with the L-INS-i algorithm for difficult-to-align sequences.

  • FastTree - Faster approximate maximum likelihood for large datasets
  • MAFFT - Create multiple sequence alignments from unaligned sequences
  • Clustal Omega - Alternative MSA tool for protein sequences

References

  • Minh, B.Q., Schmidt, H.A., Chernomor, O. et al. (2020) IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution 37(5):1530-1534. https://academic.oup.com/mbe/article/37/5/1530/5721363
  • Kalyaanamoorthy, S., Minh, B.Q., Wong, T.K.F., von Haeseler, A., Jermiin, L.S. (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Methods 14:587-589. https://pmc.ncbi.nlm.nih.gov/articles/PMC5453245/
  • Hoang, D.T., Chernomor, O., von Haeseler, A., Minh, B.Q., Vinh, L.S. (2018) UFBoot2: Improving the ultrafast bootstrap approximation. Molecular Biology and Evolution 35(2):518-522. https://iqtree.github.io/