DREME Tutorial

DREME is a tool for discovering short regular expression motifs that are enriched in the provided dataset. It is limited to working with DNA or RNA as the large combination space for amino acids makes DREME's approach unfeasible. DREME also has the capability of using two datasets to find motifs that are enriched in one when compared to the other.

How DREME works

Note the following refers to the sequence set in which you are finding motifs as the positive sequences and to the control sequence set as the negative sequences.

Sequence set

DREME works best with lots of short (~100bp) sequences. If you have a couple of long sequences then it might be beneficial to split them into many smaller (~100bp) sequences. With ChIP-seq data we recommend using 100bp regions around the peaks.

Comparative sequence set

DREME always uses a control sequence set but you don't have to supply it as DREME can create it by using di-nucleotide shuffling. If you wish to use your own sequence set then there are a few guidelines you should follow.

The sequence lengths of the control sequences should be roughly the same as the sequences to search for motifs. This is because the null model assumes that the probability of finding a match in a sequence in either sequence set will be roughly the same for an uninteresting motif. If the control sequences are longer this provides more locations that the motif could match making it more likely it will match and hence skewing the p-value calculations, possibly excluding a motif you would be interested in.

The MEME Suite

Motif-based sequence analysis tools

Overview

How DREME works

Sequence set

Comparative sequence set