Usage (Python 2.7):

dreme [options] -p <primary sequence file> [-n <control sequence file>]

Usage (Python 3.x):

dreme-py3 [options] -p <primary sequence file> [-n <control sequence file>]

Description

DREME discovers short, ungapped motifs (recurring, fixed-length patterns) that are relatively enriched in your sequences compared with shuffled sequences or your control sequences (sample output from sequences). See this Manual or this Tutorial for more information.


DREME (Discriminative Regular Expression Motif Elicitation) finds relatively short motifs (up to 8 positions) fast. The input to DREME is one or two sets of sequences. The control sequences should be approximately the same length as the primary sequences. If you do not provide a control set, the program shuffles the primary set to provide a control set. The program uses Fisher's Exact Test to determine significance of each motif found in the postive set as compared with its representation in the control set, using a significance threshold that may be set on the command line.

DREME achieves its high speed by restricting its search to regular expressions based on the symbols available in the alphabet, and by using a heuristic estimate of generalised motifs' statistical significance.

Two different versions of Dreme are provided: dreme for Python 2.7 and dreme-py3 for Python 3.x. The random number generator has changed between Python 2.7 and Python 3.x, so results between the two verisons may differ if you don't explicitly provide a control sequence file.

Input

Sequence file (primary)

A collection of sequences in FASTA format. The sequences should all be approximately the same length.

Output

DREME outputs its results primarily as an HTML file named dreme.html. DREME also outputs a machine-readable XML file and a plain-text versions of its results, named dreme.xml and dreme.txt, respectively.

Additionally DREME can output motif logos if the -png and/or -eps options are specified.

Options

OptionParameterDescriptionDefault Behaviour
Input/Output
-odirCreate a folder called dir and write output files in it. This option is not compatible with -oc as only one output folder is allowed.The program behaves as if -oc dreme_out had been specified.
-ocdirCreate a folder called dir but if it already exists allow overwriting the contents. This option is not compatible with -o as only one output folder is allowed.The program behaves as if -oc dreme_out had been specified.
-pprimary sequence file The name of a file containing the primary sequences in FASTA format. Required argument.
-ncontrol sequence file The name of a file containing the control sequences in FASTA format. DREME will create a set of control sequences by shuffling the primary sequences while preserving dimer frequencies.
-png  Output motif logo images in portable network graphics (png) format. This format is useful for display on websites. Images are not output in png format.
-eps  Output motif logo images in Encapsulated Postscript (eps) format. This format is useful for inclusion in publications as it is a vector graphics format and can be easily scaled. Images are not output in eps format.
Alphabet
-dna  Use the standard DNA alphabet. This is the default alphabet anyway so the option only exists for symmetry. The standard DNA alphabet is used.
-rna  Use the standard RNA alphabet. The standard DNA alphabet is used.
-protein  Use the standard protein alphabet. This does not work very well because the protein alphabet only has 3 ambiguous symbols which does not cover the range of possibilities very well at all. The standard DNA alphabet is used.
-alphalphabet fileIf the input sequences are in a non-standard alphabet, specify an alphabet file containing the alphabet defintion. Note that DREME works best when there are ambiguous symbols for all likely combinations of core symbols. As DREME is currently implemented it can only access ambiguous symbols that are the union of two other (possibly ambiguous) symbols. Incompatible with options -dna, -rna and -protein. The sequences are assumed to be in the DNA alphabet.
General
-norc  Search only the given primary sequences for motifs. Search the given primary sequences and their reverse complements for motifs when the alphabet is complementable.
-gngen Set the number of REs to generalize. Increasing ngen will make the search more through at some cost in speed. DREME will generalise 100 REs.
-sseed Set the seed for the random number generator used to shuffle the sequences. When the -n option is given the control sequences will be used instead of shuffling. The random number generator is initilised with a value of 1.
Stopping Conditions
-ee Stop searching for motifs when the next motif's E-value is > e. Stop discovering motifs if the E-value threshold of 0.05 is exceeded.
-mm Stop searching for motifs when m motifs have been found. There is no limit on the number of discovered motifs.
-tt Stop searching for motifs when t seconds have elapsed. There is no limit on the time taken to discover motifs.
Set Core Motif Width
-minkmink Set the minimum width of the motif core. A minimum core width of 3 is used.
-maxkmaxk Set the maximum width of the motif core. A maximum core width of 7 is used.
-kk Set the width of the motif core. This sets minimum width and maximum width to the same number. The defaults for minimum and maximum width are used.
Miscellaneous
-desc description Include the text description in the HTML output. No description in the HTML output.
-dfile desc file Include the first 500 characters of text from the file desc file in the HTML output. No description in the HTML output.
-verbosity1|​2|​3|​4|​5A number that regulates the verbosity level of the output information messages. If set to 1 (quiet) then it will only output error messages whereas the other extreme 5 (dump) outputs lots of mostly useless information.The verbosity level is set to 2 (normal).
-h  Display a usage message and exit. Run as normal
-version Display the version and exit. Run as normal.
Experimental (use at own risk)
-l  Print list of enrichment of all REs tested.  

Citing

If you use DREME in your research please cite the following paper:
Timothy L. Bailey, "DREME: Motif discovery in transcription factor ChIP-seq data", Bioinformatics, 27(12):1653-1659, 2011. [full text]