What are DNA i-motifs?

A web server for i-motif prediction from genomic features

DNA i-motifs are non-canonical four-stranded structures formed by cytosine-rich sequences under mildly acidic conditions. They consist of intercalated C·C+ base pairs and may regulate transcription, replication and chromatin organization.

iMotif Predictor provides pre-trained machine learning models to score genomic windows for their i-motif potential. The models integrate DNA sequence with experimental measurements such as microarray signals, ATAC-seq and histone modifications.

All models were trained on genome-wide data from HEK293T cells (all chromosomes except chromosome 1), using a class-weighted loss to address the strong imbalance between i-motif and non–i-motif windows.

Quadruplex structures: i-motif and G-quadruplex
Figure adapted from: Diana Zanin et al. (2023). Genome-wide i-motif and G-quadruplex prediction tool enables the identification of non-canonical DNA structures in regulatory regions. NAR Genomics and Bioinformatics, 5(1).
Image used under the Creative Commons Attribution License (CC BY 4.0): https://creativecommons.org/licenses/by/4.0/
Web-based i-motif prediction

Upload feature tables and run cell-type specific models

Select one of the available models, upload an Excel file with the required columns, and download prediction scores for each genomic window.

One row per genomic window. The file must contain a Sequence column and the required feature columns.

Quick demo (sequence only)

Paste a single DNA sequence and get an i-motif prediction using the Sequence Only model. No Excel file needed.

The demo expects a 124-nt DNA sequence. You can edit the sequence above, but its length (after removing spaces/newlines) must remain 124.
Pre-trained models

Available feature sets and training data

The web server exposes several pre-trained models that differ by the feature sets used during training. All models were trained on HEK293T genome windows (all chromosomes except chr1), using a class-weighted loss to handle the strong class imbalance between positive and negative examples.

  • Sequence Only – CNN using the 124-bp DNA sequence.
  • Sequence + Microarray – adds microarray probe intensities downstream of the window.
  • Sequence + Core Epigenetic Profile – includes key histone marks and ATAC-seq signals.
  • Sequence + Extended Epigenetic Profile – extended panel of epigenetic marks.
  • Sequence + Microarray + Epigenetics ± ΔG – combined models integrating sequence, expression and structural information.

Chromosome 1 was kept completely unseen during training and used as an independent test set for model evaluation, following the protocol described in the manuscript.

Help

How to run predictions and interpret the outputs

Input requirements

  • Upload an .xlsx file with one row per genomic window.
  • Include a Sequence column (124-bp DNA sequence).
  • Make sure all required feature columns for the selected model appear in the file (see the “Model overview” panel on the right of the Predict tab).

Output file

  • The downloaded Excel file contains all original columns.
  • An additional column score stores the model prediction for each window (higher score → higher i-motif potential).
  • The column GC_percent reports GC content of each sequence.

Result page summary

  • Global statistics of the prediction scores (mean, min, max).
  • Correlation heatmap between numeric features and the score.
  • Regex-based C-tract analysis summarizing canonical i-motif signatures.
  • A preview table of the first rows, with an interactive score cutoff slider.