centrimo [options] <primary sequence file> <motif file>+
File(s) containing MEME formatted motifs. Outputs from MEME and DREME are supported, as well as Minimal MEME Format. You can convert many other motif formats to MEME format using conversion scripts available with the MEME Suite.
A file containing FASTA formatted sequences, ideally all of the same length. The sequences in this file are referred to as the "primary sequences" when a second set of (control) sequences is provided using the --neg option (see below).
CentriMo outputs an HTML file named
centrimo.html
that allows interactive selection of which motifs to plot the
positional distribution for and control over
smoothing and other plotting parameters. CentriMo also
outputs two text files:
centrimo.txt
,
a tab delimited version of the results, and
site_counts.txt
,
which lists, for each motif and each offset, the number of sequences where the best match of the motif
occurs at the given offset.
Option | Parameter | Description | Default Behaviour |
---|---|---|---|
Input/Output | |||
--neg | control sequence file | Plot the motif distributions in this set (the control sequences) as well. |   |
--disc | For each enriched region in the primary sequences, the signficance of the relative enrichment of the motif in that region in the primary versus control sequences is evaluated using Fisher's exact test. Requires the control sequences to be supplied with the --neg option. | Use the binomial test on the primary sequences to evaluate motif enrichment. | |
--motif-pseudo | pseudocount | Add a this total pseudocount to the counts in each motif column
when converting a frequency matrix to a log-odds score matrix.
The pseudocount added to each count is pseudocount
times the background frequency of the letter (see option
--bfile, above).
Note: Counts are computed from
MEME formatted motifs by multiplying the
the frequency of the letter times the value of nsites given
in the motif letter-probability matrix header line.
|
The program applies a pseudocount of 0.1. |
--motif | ID | Select the motif with the ID for scanning. This option may be repeated to select multiple motifs. | The program scans with all the motifs. |
--seqlen | length | Use sequences with the length length ignoring all other sequences in the input file(s). | Use sequences with the same length as the first sequence, ignoring all other sequences in the input file(s). |
Scanning | |||
--score | S | The score threshold for PWMs, in bits. Sequences without a match with score ≥ S are ignored. | A score of 5 is used. |
--optimize_score | Search for the optimal score above the minimum threshold given by the --score option. | The minimum score threshold is used. | |
--maxreg | max region | The maximum region size to consider. | Try all region sizes up to the sequence width. |
--minreg | min region | The minimum region size to consider. Must be less than max region. | Try regions 1 bp and larger. |
--norc | Do not scan with the reverse complement motif. | Scans with the reverse complement motif. | |
--flip | reverse complement matches appear 'reflected' around sequence centers. | Do not 'flip' the sequence; use rc of motif instead. | |
--local | Compute enrichment of all regions. | Compute enrichment of central regions. | |
Output filtering | |||
--ethresh | thresh | Limit the results to motifs with an enriched region whose E-value is less than thresh. Enrichment E-values are computed by first adjusting the binomial p-value of a region for the number of regions tested using the Bonferroni correction, and then multiplying the adjusted p-value by the number of motifs in the input to CentriMo. | Include motifs with E-values up to 10. |
Miscellaneous | |||
--desc | description | Include the text description in the HTML output. | No description in the HTML output. |
--dfile | desc file | Include the first 500 characters of text from the file desc file in the HTML output. | No description in the HTML output. |
--noseq | Do not store sequence IDs in the output of CentriMo. | CentriMo stores a list of the sequence IDs with matches in the best region for each motif. This can potentially make the file size much larger. | |
--version | Display the version and exit. | Run as normal. |
If you use CentriMo in your research, please cite the following paper:
Timothy L. Bailey and Philip Machanick,
"Inferring direct DNA binding from ChIP-seq",
Nucleic Acids Research, 40:e128, 2012.
[full text]