ame [options] <sequence file> <motif file>+
The command-line version of AME supports a wide variety of methods for scoring motif enrichment and many methods of testing the scored motif enrichment for significance. If a set of control sequences is provided, AME determines if each motif is enriched in the primary set compared to the control set. If no control sequence set is given, the primary sequences are assumed to be sorted in decreasing order of 'importance' according to some secondary criterion. For each motif, AME determines if the sequences at the top of the list are significantly enriched for matches to it. With no control sequences, by default AME performs unconstrained partition maximization, looking for the pair of thresholds (on motif score and the secondary criterion, respectively) that yields the most significant result.
A set of (primary) sequences in FASTA format. The sequences must be sorted by increasing value of some secondary criterion (e.g., expression level, peak height, fluorescence score). In this documentation, we refer to this secondary criterion as the "FASTA score". This score can optionally be placed in the FASTA ID line. If present, the FASTA score must come immediately after the sequence ID. For example, if the FASTA ID line is
>seq_1 0.123
0.123
is the FASTA score for that sequence.
File(s) containing MEME formatted motifs. Outputs from MEME and DREME are supported, as well as Minimal MEME Format. You can convert many other motif formats to MEME format using conversion scripts available with the MEME Suite.
AME writes to a directory, ame_out
, unless a different
directory name is specified on the command line. The output directory
contains outputs in two formats: HTML and plain text, in files named
respectively ame.html
and ame.txt
.
The text output includes a consensus sequence
computed for each significant motif.
Option | Parameter | Description | Default Behaviour |
---|---|---|---|
General Options | |||
--control | file | A set of control sequences in FASTA format. If a set of control sequences is provided, AME determines if each motif is enriched in the primary set compared to the control set. With this option, the order of the sequences in the primary and control sets does not affect the results. Depending on the biological application, the control sequences might be, for example, randomly chosen genome regions or shuffled sequences (the utility fasta-shuffle-letters can be used to create these from the primary sequences). Note: The control sequences should have (approximately) the same distribution of lengths as the primary sequences or AME may fail to correctly detect enriched motifs and will report inaccurate p-values. | If no control sequences are provided, AME searches for position N in the list of primary sequences (a "partition") such that the given motif is maximally enriched in the first N sequences relative to the remaining primary sequences. Consequently, without a control set of sequences, the order of the sequences in the primary set matters. |
--method | fisher|ranksum|mhg|4dmhg|spearman|linreg | Select the association function for testing motif enrichment
significance.
mhg and 4dmhg use the
alternative hypothesis that high motif scores are enriched in the primary sequences
(or among the first sequences if only one set is given). Note that linear
regression and Spearman rank correlation tests do not calculate p-values.
Please use RAMEN if you desire to use linear regression with p-values.
|
The Fisher's Exact test (fisher ) is used for testing motif enrichment. |
--scoring | avg|max|sum|totalhits | Method of scoring a single sequence for matches to a motif. The score assigned to a sequence
is either:
|
The totalhits scoring method is used. |
--bgformat | 0|1|2 | Select the background source.
|
The background specified in the motif file is used. |
--bgfile | bfile | Use the 0-order frequencies in the specified file in Markov Background Model format as the background model for converting the position-specific probability matrix (PSPM) motifs into position-specific log-odds scoring matrices (PSSM). This option overrides the --bgformat option. | Use frequencies based on the motif file or files. See also --bgformat |
--pvalue-threshold | pvalue | Threshold to consider single motif hit significant; only valid with totalhits
scoring method. |
A p-value threshold of 0.0002 is used. |
--length-correction | Correct for length bias by making the hit p-value threshold more stringent
for longer sequences. Note: This option is only valid with the totalhits
scoring method, which is the default scoring method.
|
The same p-value threshold (set with option --pvalue-threshold) is used for determining hits regardless of sequence length. | |
--pvalue-report-threshold | pvalue | Corrected p-value threshold for reporting a motif as significantly enriched. | The threshold of 0.05 is used for reporting a motif. |
--pwm-threshold | score | For the Fisher's exact test only. This is the minimum motif-based sequence score for a sequence to be a 'positive'. | A minimum motif score of 1 is used to call a sequence a 'positive'. |
--fasta-threshold | p-value | For the Fisher's exact test only when --poslist is used.
This is the maximum FASTA score to call a sequence a
'positive'. |
A maximum FASTA score of 0.001 is used to call a sequence a 'positive'. |
--fix-partition | num | Number of positive sequences; the remainder are the negative sequences. | If no control set is provided, partition maximization is performed over the (sorted) input sequences. The partition with the lowest p-value is used. |
--rsmethod | better|quick | Select how to compute the Wilcoxon rank-sum test:
|
Use the heuristic version of the test (quick ). |
--poslist | pwm|fasta | For partition maximization, test thresholds on either X (motif)
or Y (FASTA score). Only applies for partition maximization
and for the Ranksum test.
poslist . It switches between
using X and Y for determining true positives in the contingency matrix,
in addition to switching which of X and Y is used for partition maximization. |
Use the FASTA score. |
--log-fscores | For linear regression and Spearman tests only: regress using ln(FASTA score), rather than the score directly. | Use the score directly. | |
--log-pwmscores | For linear regression and Spearman tests only: regress using ln(pwm score), rather than the score directly. | Use the score directly. | |
--normalise-affinity | Normalise motif scores so that motif scores can be compared directly. Only relevant for Spearman and Linear Regression tests, where p-values are not calculated. | ||
--linreg-switchxy | Make the x-points FASTA scores and the y-points motif scores. Only relevant for Spearman and Linear Regression tests. | Keep the original axis. | |
--verbose | 1|2|3|4|5 | A number that regulates the verbosity level of the output information messages. If set to 1 (quiet) then it will only output error messages whereas the other extreme 5 (dump) outputs lots of mostly useless information. This option is best placed first. At verbosity level 3, AME will report the significance of each set of each partition of the sequences that it considers. | The verbosity level is set to 2 (normal). |
--help | Print a usage message and exit. | Run as normal. | |
--version | Display the version and exit. | Run as normal. |
If you use AME in your research, please cite the following paper:
Robert McLeay and Timothy L. Bailey,
"Motif Enrichment Analysis: A unified framework and method evaluation",
BMC Bioinformatics, 11:165, 2010, doi:10.1186/1471-2105-11-165.
[full text]