SMRTnet is a deep learning framework for predicting small molecule-RNA interactions (SRIs) using only RNA secondary structure. Developed by researchers at Tsinghua University and published in Nature Biotechnology in 2025, the method eliminates the need for RNA 3D structures, making it practical for drug discovery against disease-associated RNAs that lack defined tertiary structures.
Traditional approaches to predicting small molecule-RNA interactions require prior knowledge of RNA tertiary structures. However, most disease-related RNAs have unknown 3D conformations and few have characterized binding sites. SMRTnet addresses this limitation by working with secondary structure alone, vastly expanding the scope of targetable RNAs.
ProteinIQ provides a web interface for running SMRTnet without local installation or GPU setup. You provide RNA sequences with their secondary structures, supply small molecules as SMILES strings, and receive ranked interaction predictions.
| Input | Description |
|---|---|
RNA Target(s) | Tab-separated format with three columns: name, sequence, and dot-bracket structure. Minimum 31 nucleotides. Sequence and structure lengths must match. |
Small Molecule(s) | Tab-separated format with two columns: name and RDKit-compatible SMILES string. |
Each RNA entry requires a name, nucleotide sequence (A, G, C, U), and secondary structure in dot-bracket notation where parentheses denote base pairs and dots denote unpaired nucleotides:
1MYC_IRES GUGGGGGCUUCGCCUCUGGCCCAGCCCUCAC (((((((((..(((...)))..)))))))))2TAU_UTR GCUAGCUAGCUAGCUAGCUAGCUAGCUAGCU ...(((((.......)))))............Each molecule entry requires a name and SMILES string:
13902-71-4 CC1=CC(=O)OC2=C1C=C3C=C(OC3=C2C)C2caffeine CN1C=NC2=C1C(=O)N(C(=O)N2C)C3aspirin CC(=O)OC1=CC=CC=C1C(=O)OSMRTnet returns a ranked table of all small molecule-RNA pairs sorted by predicted binding probability.
| Column | Description |
|---|---|
Rank | Position in the sorted results, where 1 is the strongest predicted interaction. |
Molecule | Name of the small molecule from the input. |
RNA Target | Name of the RNA target from the input. |
Binding Probability | Ensemble prediction score between 0 and 1. Higher values indicate stronger predicted binding. |
Interaction | Classification as "Yes" (probability > 0.5) or "No" (probability ≤ 0.5). |
| Score | Interpretation |
|---|---|
| > 0.8 | Strong predicted interaction; high priority for experimental validation |
| 0.5–0.8 | Moderate predicted interaction; consider for follow-up |
| < 0.5 | Weak or no predicted interaction |
SMRTnet employs a multimodal architecture that integrates two language models with convolutional and graph neural networks to learn joint representations of RNA targets and small molecule ligands.
The RNA encoder combines two complementary representations:
RNASwan-seq: An RNA language model with 30 transformer encoder blocks using Rotary Positional Embeddings. Each block contains 640 hidden units and 20 attention heads. The model learns sequence-level features through masked language modeling.
CNN block: A two-layer convolutional network with residual connections that extracts structural features from the dot-bracket secondary structure notation, capturing base-pairing patterns and loop regions.
The small molecule encoder also uses dual representations:
MoLFormer: A pre-trained chemical language model that generates molecular embeddings from SMILES strings, capturing learned chemical semantics.
Graph Attention Network: A three-layer GAT that represents molecules as graphs (atoms as nodes, bonds as edges), learning structural features through attention-weighted message passing.
An attention-based fusion module integrates the RNA and small molecule representations. Cross-attention mechanisms allow the model to learn which molecular features interact with which RNA structural elements.
Five models from 5-fold cross-validation contribute to each prediction. The final binding probability is the median score across all five models, reducing variance from individual model biases.
SMRTnet was validated through large-scale experimental testing using microscale thermophoresis (MST):
The model outperforms existing methods across multiple benchmark datasets including R-BIND, R-SIM, SMMRNA, and NALDB.
Secondary structure quality: Prediction accuracy depends on the input structure. Experimentally-determined structures (e.g., from SHAPE-MaP or enzymatic probing) yield more reliable results than computationally predicted folds.
Training data scope: The model was trained on known small molecule-RNA complexes from the PDB. Novel chemical scaffolds or RNA motifs underrepresented in training data may have reduced prediction accuracy.
Binding site localization: While SMRTnet can identify potential binding regions through attention analysis, the ProteinIQ implementation focuses on binding probability prediction. Detailed binding site mapping requires the original SMRTnet interpretability workflow.
Experimental validation required: As noted by the authors, predictions should be reviewed by domain experts before proceeding to wet-lab validation.