DeepImmuno is a deep learning model that predicts whether a peptide will trigger a T cell immune response when presented by a specific MHC class I molecule. Developed at Cincinnati Children's Hospital Medical Center, it uses a convolutional neural network (DeepImmuno-CNN) trained on over 8,900 experimentally validated peptide-MHC immunogenicity assays from the Immune Epitope Database (IEDB).
Predicting immunogenicity is harder than predicting MHC binding. A peptide may bind an MHC molecule tightly yet fail to activate T cells. DeepImmuno addresses this downstream question directly: given a peptide-MHC pair, how likely is it to be immunogenic?
On independent benchmarks, DeepImmuno-CNN identified 83% of immunogenic tumor neoantigens (compared to 63% for IEDB and 34% for DeepHLApan) and showed similarly strong performance on SARS-CoV-2 and dengue virus epitope datasets.
Rather than one-hot encoding, DeepImmuno represents each amino acid using 566 physicochemical features from the AAindex1 database, compressed via principal component analysis (PCA). This captures biologically meaningful properties like hydrophobicity, charge, and size while keeping the feature space manageable. HLA allele sequences are encoded the same way using paratope information from the IMGT database, creating a combined peptide-MHC feature matrix.
Each peptide-MHC pair passes through two convolutional layers followed by two fully connected dense layers. The convolutional layers extract local sequence motifs, while the dense layers learn the non-linear relationships between these motifs and immunogenicity.
Instead of treating immunogenicity as a binary yes/no, the training labels are continuous scores derived from a beta-binomial model. A peptide tested in 40 subjects with consistently positive responses receives a higher immunogenicity score than one tested in only 6 subjects. This probabilistic labeling captures experimental uncertainty and evidence quality.
Occlusion sensitivity analysis of the trained model shows that peptide positions P4, P5, and P6 contribute most to predictions. These are the central residues that contact the T cell receptor, consistent with structural immunology. Anchor positions P2 and P9, which bind the MHC groove, are less informative for immunogenicity since MHC binding is assumed.
ProteinIQ provides cloud-hosted access to DeepImmuno-CNN, requiring no Python environment or GPU setup.
| Input | Description |
|---|---|
Peptide Sequences | One or more peptide sequences, 9 or 10 amino acids each. Accepts raw sequences (one per line), FASTA format, or file upload (.txt, .fasta, .csv, .tsv). |
DeepImmuno-CNN only supports 9-mer and 10-mer peptides. Longer sequences must be pre-processed into overlapping windows before submission.
| Setting | Description |
|---|---|
HLA assignment mode | Single HLA for all peptides (default) applies one allele to every sequence. One HLA per peptide allows a different allele for each sequence. |
HLA allele | The MHC-I allele to use in single mode. Default: HLA-A*0201, the most common allele in many populations. |
Per-sequence HLA alleles | One allele per line, matching the order of input peptides. Required when using per-sequence mode. |
Immunogenicity threshold | Score cutoff for the immunogenic/non-immunogenic label (0.0–1.0, default 0.5). |
DeepImmuno supports 20 MHC class I alleles covering the most common HLA-A, HLA-B, and HLA-C types:
| HLA-A | HLA-B | HLA-C |
|---|---|---|
| A*01:01 | B*07:02 | C*07:02 |
| A*02:01 | B*08:01 | |
| A*02:02 | B*15:01 | |
| A*02:03 | B*27:05 | |
| A*02:06 | B*35:01 | |
| A*03:01 | B*40:01 | |
| A*11:01 | B*44:02 | |
| A*24:02 | B*44:03 | |
| A*26:01 | B*51:01 |
Results are returned as a spreadsheet with one row per peptide-MHC pair.
| Column | Description |
|---|---|
Sequence ID | Identifier from FASTA header or auto-generated index. |
Peptide | The input peptide sequence. |
HLA Allele | The MHC-I allele used for this prediction. |
Immunogenicity Score | Continuous score from 0 to 1. Higher values indicate greater predicted immunogenicity. |
Label | Immunogenic or Non-immunogenic based on the threshold setting. |
The immunogenicity score reflects the model's confidence that a peptide-MHC complex will elicit a T cell response. It is not a probability in the strict statistical sense but correlates with experimental immunogenicity rates.
| Score range | Interpretation |
|---|---|
| 0.8–1.0 | Strong immunogenic signal. High-priority candidates for experimental validation. |
| 0.5–0.8 | Moderate signal. Worth investigating, especially if supported by other evidence (binding affinity, expression level). |
| 0.2–0.5 | Weak signal. Unlikely to be immunogenic but cannot be ruled out. |
| 0.0–0.2 | Minimal immunogenic potential. |
The default threshold of 0.5 balances sensitivity and specificity. For vaccine design, where missing a true epitope is costly, lowering the threshold to 0.3–0.4 may be appropriate. For neoantigen prioritization with limited validation capacity, raising it to 0.6–0.7 focuses on the most confident predictions.
| B*57:01 |