SMRTnet

Predict small molecule-RNA interactions using RNA secondary structure with deep learning

Input

Job name

RNA Target(s)

Small Molecule(s)

10 credits

Output

Configure input settings, then click "Submit"

What is SMRTnet?

SMRTnet is a deep learning framework for predicting small molecule-RNA interactions (SRIs) using only RNA secondary structure. Developed by researchers at Tsinghua University and published in Nature Biotechnology in 2025, the method eliminates the need for RNA 3D structures, making it practical for drug discovery against disease-associated RNAs that lack defined tertiary structures.

Traditional approaches to predicting small molecule-RNA interactions require prior knowledge of RNA tertiary structures. However, most disease-related RNAs have unknown 3D conformations and few have characterized binding sites. SMRTnet addresses this limitation by working with secondary structure alone, vastly expanding the scope of targetable RNAs.

Applications

mRNA targeting: Screening compounds against oncogene mRNAs (e.g., MYC, KRAS) that encode undruggable proteins
microRNA modulation: Identifying small molecules that bind oncogenic miRNAs
Viral RNA targeting: Finding compounds that interact with viral RNA elements (e.g., SARS-CoV-2, HIV)
Repeat expansion disorders: Screening for molecules targeting pathogenic RNA repeats in diseases like Huntington's and myotonic dystrophy

How to use SMRTnet online

ProteinIQ provides a web interface for running SMRTnet without local installation or GPU setup. You provide RNA sequences with their secondary structures, supply small molecules as SMILES strings, and receive ranked interaction predictions.

Inputs

Input	Description
`RNA Target(s)`	Tab-separated format with three columns: name, sequence, and dot-bracket structure. Minimum 31 nucleotides. Sequence and structure lengths must match.
`Small Molecule(s)`	Tab-separated format with two columns: name and RDKit-compatible SMILES string.

RNA input format

Each RNA entry requires a name, nucleotide sequence (A, G, C, U), and secondary structure in dot-bracket notation where parentheses denote base pairs and dots denote unpaired nucleotides:

1MYC_IRES	GUGGGGGCUUCGCCUCUGGCCCAGCCCUCAC	(((((((((..(((...)))..)))))))))2TAU_UTR	GCUAGCUAGCUAGCUAGCUAGCUAGCUAGCU	...(((((.......)))))............

Small molecule input format

Each molecule entry requires a name and SMILES string:

13902-71-4	CC1=CC(=O)OC2=C1C=C3C=C(OC3=C2C)C2caffeine	CN1C=NC2=C1C(=O)N(C(=O)N2C)C3aspirin	CC(=O)OC1=CC=CC=C1C(=O)O

Results

SMRTnet returns a ranked table of all small molecule-RNA pairs sorted by predicted binding probability.

Column	Description
`Rank`	Position in the sorted results, where 1 is the strongest predicted interaction.
`Molecule`	Name of the small molecule from the input.
`RNA Target`	Name of the RNA target from the input.
`Binding Probability`	Ensemble prediction score between 0 and 1. Higher values indicate stronger predicted binding.
`Interaction`	Classification as "Yes" (probability > 0.5) or "No" (probability ≤ 0.5).

Interpreting binding probability

Score	Interpretation
> 0.8	Strong predicted interaction; high priority for experimental validation
0.5–0.8	Moderate predicted interaction; consider for follow-up
< 0.5	Weak or no predicted interaction

How does SMRTnet work?

SMRTnet employs a multimodal architecture that integrates two language models with convolutional and graph neural networks to learn joint representations of RNA targets and small molecule ligands.

RNA encoder

The RNA encoder combines two complementary representations:

RNASwan-seq: An RNA language model with 30 transformer encoder blocks using Rotary Positional Embeddings. Each block contains 640 hidden units and 20 attention heads. The model learns sequence-level features through masked language modeling.
CNN block: A two-layer convolutional network with residual connections that extracts structural features from the dot-bracket secondary structure notation, capturing base-pairing patterns and loop regions.

Small molecule encoder

The small molecule encoder also uses dual representations:

MoLFormer: A pre-trained chemical language model that generates molecular embeddings from SMILES strings, capturing learned chemical semantics.
Graph Attention Network: A three-layer GAT that represents molecules as graphs (atoms as nodes, bonds as edges), learning structural features through attention-weighted message passing.

Multimodal fusion

An attention-based fusion module integrates the RNA and small molecule representations. Cross-attention mechanisms allow the model to learn which molecular features interact with which RNA structural elements.

Ensemble prediction

Five models from 5-fold cross-validation contribute to each prediction. The final binding probability is the median score across all five models, reducing variance from individual model biases.

Validation

SMRTnet was validated through large-scale experimental testing using microscale thermophoresis (MST):

40 confirmed hits across 10 disease-associated RNA targets with binding affinities ranging from nanomolar to micromolar
MYC IRES validation: Predicted binding scores correlated with experimental validation rates. One predicted compound downregulated MYC expression, inhibited proliferation, and induced apoptosis in three cancer cell lines.

The model outperforms existing methods across multiple benchmark datasets including R-BIND, R-SIM, SMMRNA, and NALDB.

Limitations

Secondary structure quality: Prediction accuracy depends on the input structure. Experimentally-determined structures (e.g., from SHAPE-MaP or enzymatic probing) yield more reliable results than computationally predicted folds.
Training data scope: The model was trained on known small molecule-RNA complexes from the PDB. Novel chemical scaffolds or RNA motifs underrepresented in training data may have reduced prediction accuracy.
Binding site localization: While SMRTnet can identify potential binding regions through attention analysis, the ProteinIQ implementation focuses on binding probability prediction. Detailed binding site mapping requires the original SMRTnet interpretability workflow.
Experimental validation required: As noted by the authors, predictions should be reviewed by domain experts before proceeding to wet-lab validation.

ADMET-AI: Predict absorption, distribution, metabolism, excretion, and toxicity properties for small molecules identified by SMRTnet
Lipinski's Rule of 5: Assess oral bioavailability of candidate compounds
DiffDock: Molecular docking for protein-ligand interactions (complements RNA-focused SMRTnet)
Molecular Descriptors: Calculate physicochemical properties of small molecule hits

SMRTnet

Input

Output

What is SMRTnet?

Applications

How to use SMRTnet online

Inputs

RNA input format

Small molecule input format

Results

Interpreting binding probability

How does SMRTnet work?

RNA encoder

Small molecule encoder

Multimodal fusion

Ensemble prediction

Validation

Limitations

Related tools

Input

Output