ProteinIQ
SMRTnet icon

SMRTnet

Predict small molecule-RNA interactions using RNA secondary structure with deep learning

What is SMRTnet?

SMRTnet is a deep learning framework for predicting small molecule-RNA interactions (SRIs) using only RNA secondary structure. Developed by researchers at Tsinghua University and published in Nature Biotechnology in 2025, the method eliminates the need for RNA 3D structures, making it practical for drug discovery against disease-associated RNAs that lack defined tertiary structures.

Traditional approaches to predicting small molecule-RNA interactions require prior knowledge of RNA tertiary structures. However, most disease-related RNAs have unknown 3D conformations and few have characterized binding sites. SMRTnet addresses this limitation by working with secondary structure alone, vastly expanding the scope of targetable RNAs.

Applications

  • mRNA targeting: Screening compounds against oncogene mRNAs (e.g., MYC, KRAS) that encode undruggable proteins
  • microRNA modulation: Identifying small molecules that bind oncogenic miRNAs
  • Viral RNA targeting: Finding compounds that interact with viral RNA elements (e.g., SARS-CoV-2, HIV)
  • Repeat expansion disorders: Screening for molecules targeting pathogenic RNA repeats in diseases like Huntington's and myotonic dystrophy

How to use SMRTnet online

ProteinIQ provides a web interface for running SMRTnet without local installation or GPU setup. You provide RNA sequences with their secondary structures, supply small molecules as SMILES strings, and receive ranked interaction predictions.

Inputs

InputDescription
RNA Target(s)Tab-separated format with three columns: name, sequence, and dot-bracket structure. Minimum 31 nucleotides. Sequence and structure lengths must match.
Small Molecule(s)Tab-separated format with two columns: name and RDKit-compatible SMILES string.

RNA input format

Each RNA entry requires a name, nucleotide sequence (A, G, C, U), and secondary structure in dot-bracket notation where parentheses denote base pairs and dots denote unpaired nucleotides:

1MYC_IRES	GUGGGGGCUUCGCCUCUGGCCCAGCCCUCAC	(((((((((..(((...)))..)))))))))2TAU_UTR	GCUAGCUAGCUAGCUAGCUAGCUAGCUAGCU	...(((((.......)))))............

Small molecule input format

Each molecule entry requires a name and SMILES string:

13902-71-4	CC1=CC(=O)OC2=C1C=C3C=C(OC3=C2C)C2caffeine	CN1C=NC2=C1C(=O)N(C(=O)N2C)C3aspirin	CC(=O)OC1=CC=CC=C1C(=O)O

Results

SMRTnet returns a ranked table of all small molecule-RNA pairs sorted by predicted binding probability.

ColumnDescription
RankPosition in the sorted results, where 1 is the strongest predicted interaction.
MoleculeName of the small molecule from the input.
RNA TargetName of the RNA target from the input.
Binding ProbabilityEnsemble prediction score between 0 and 1. Higher values indicate stronger predicted binding.
InteractionClassification as "Yes" (probability > 0.5) or "No" (probability ≤ 0.5).

Interpreting binding probability

ScoreInterpretation
> 0.8Strong predicted interaction; high priority for experimental validation
0.5–0.8Moderate predicted interaction; consider for follow-up
< 0.5Weak or no predicted interaction

How does SMRTnet work?

SMRTnet employs a multimodal architecture that integrates two language models with convolutional and graph neural networks to learn joint representations of RNA targets and small molecule ligands.

RNA encoder

The RNA encoder combines two complementary representations:

  • RNASwan-seq: An RNA language model with 30 transformer encoder blocks using Rotary Positional Embeddings. Each block contains 640 hidden units and 20 attention heads. The model learns sequence-level features through masked language modeling.

  • CNN block: A two-layer convolutional network with residual connections that extracts structural features from the dot-bracket secondary structure notation, capturing base-pairing patterns and loop regions.

Small molecule encoder

The small molecule encoder also uses dual representations:

  • MoLFormer: A pre-trained chemical language model that generates molecular embeddings from SMILES strings, capturing learned chemical semantics.

  • Graph Attention Network: A three-layer GAT that represents molecules as graphs (atoms as nodes, bonds as edges), learning structural features through attention-weighted message passing.

Multimodal fusion

An attention-based fusion module integrates the RNA and small molecule representations. Cross-attention mechanisms allow the model to learn which molecular features interact with which RNA structural elements.

Ensemble prediction

Five models from 5-fold cross-validation contribute to each prediction. The final binding probability is the median score across all five models, reducing variance from individual model biases.

Validation

SMRTnet was validated through large-scale experimental testing using microscale thermophoresis (MST):

  • 40 confirmed hits across 10 disease-associated RNA targets with binding affinities ranging from nanomolar to micromolar
  • MYC IRES validation: Predicted binding scores correlated with experimental validation rates. One predicted compound downregulated MYC expression, inhibited proliferation, and induced apoptosis in three cancer cell lines.

The model outperforms existing methods across multiple benchmark datasets including R-BIND, R-SIM, SMMRNA, and NALDB.

Limitations

  • Secondary structure quality: Prediction accuracy depends on the input structure. Experimentally-determined structures (e.g., from SHAPE-MaP or enzymatic probing) yield more reliable results than computationally predicted folds.

  • Training data scope: The model was trained on known small molecule-RNA complexes from the PDB. Novel chemical scaffolds or RNA motifs underrepresented in training data may have reduced prediction accuracy.

  • Binding site localization: While SMRTnet can identify potential binding regions through attention analysis, the ProteinIQ implementation focuses on binding probability prediction. Detailed binding site mapping requires the original SMRTnet interpretability workflow.

  • Experimental validation required: As noted by the authors, predictions should be reviewed by domain experts before proceeding to wet-lab validation.

  • ADMET-AI: Predict absorption, distribution, metabolism, excretion, and toxicity properties for small molecules identified by SMRTnet
  • Lipinski's Rule of 5: Assess oral bioavailability of candidate compounds
  • DiffDock: Molecular docking for protein-ligand interactions (complements RNA-focused SMRTnet)
  • Molecular Descriptors: Calculate physicochemical properties of small molecule hits