Usage:

ame [options] <sequence file> <motif file>+

Description

The command-line version of AME supports a wide variety of methods for scoring motif enrichment and many methods of testing the scored motif enrichment for significance. If a set of control sequences is provided, AME determines if each motif is enriched in the primary set compared to the control set. If no control sequence set is given, the primary sequences are assumed to be sorted in decreasing order of 'importance' according to some secondary criterion. For each motif, AME determines if the sequences at the top of the list are significantly enriched for matches to it. With no control sequences, by default AME performs unconstrained partition maximization, looking for the pair of thresholds (on motif score and the secondary criterion, respectively) that yields the most significant result.

Input

Sequence File

A set of (primary) sequences in FASTA format. The sequences must be sorted by increasing value of some secondary criterion (e.g., expression level, peak height, fluorescence score). In this documentation, we refer to this secondary criterion as the "FASTA score". This score can optionally be placed in the FASTA ID line. If present, the FASTA score must come immediately after the sequence ID. For example, if the FASTA ID line is

then 0.123 is the FASTA score for that sequence.

Motif File

File(s) containing MEME formatted motifs. Outputs from MEME and DREME are supported, as well as Minimal MEME Format. You can convert many other motif formats to MEME format using conversion scripts available with the MEME Suite.

Output

AME writes to a directory, ame_out, unless a different directory name is specified on the command line. The output directory contains outputs in two formats: HTML and plain text, in files named respectively ame.html and ame.txt. The text output includes a consensus sequence computed for each significant motif.

Options

Option Parameter Description Default Behaviour
General Options
--controlfile A set of control sequences in FASTA format. If a set of control sequences is provided, AME determines if each motif is enriched in the primary set compared to the control set. With this option, the order of the sequences in the primary and control sets does not affect the results. Depending on the biological application, the control sequences might be, for example, randomly chosen genome regions or shuffled sequences (the utility fasta-shuffle-letters can be used to create these from the primary sequences). Note: The control sequences should have (approximately) the same distribution of lengths as the primary sequences or AME may fail to correctly detect enriched motifs and will report inaccurate p-values. If no control sequences are provided, AME searches for position N in the list of primary sequences (a "partition") such that the given motif is maximally enriched in the first N sequences relative to the remaining primary sequences. Consequently, without a control set of sequences, the order of the sequences in the primary set matters.
--methodfisher|ranksum|mhg|4dmhg|spearman|linreg Select the association function for testing motif enrichment significance.
fisher -
one-tailed Fisher's Exact test
ranksum -
one-tailed Wilcoxon rank-sum test, also known as the Mann-Whitney U test
mhg and 4dmhg -
two-tailed tests described in McLeay and Bailey, "Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data", BMC Bioinformatics 11:165, 2010. Note: Motifs enriched in either the primary or control sequences (or at the top or bottom of the primary sequences if only one sequence file is given) are considered significant by these tests.
spearman -
Spearman's rank coefficient (ρ) between the motif score and the FASTA score (in the FASTA IDs of the sequences). Not valid with --control. Not valid with --fix-partition; always uses all input sequences.
linreg -
mean-squared error of the linear regression of the motif score and the FASTA score (in the FASTA IDs of the sequences). Note that the FASTA ID lines must each contain a FASTA score with this function. Not valid with --control. Using this function, AME reports raw values only. For the full approach used in the paper, please use the tool RAMEN, which is also included in the MEME Suite. RAMEN also supports estimation of p-values.
All statistical tests except mhg and 4dmhg use the alternative hypothesis that high motif scores are enriched in the primary sequences (or among the first sequences if only one set is given). Note that linear regression and Spearman rank correlation tests do not calculate p-values. Please use RAMEN if you desire to use linear regression with p-values.
The Fisher's Exact test (fisher) is used for testing motif enrichment.
--scoringavg|max|sum|totalhits Method of scoring a single sequence for matches to a motif. The score assigned to a sequence is either:
avg -
the average motif score of the sequence (note: motif scores are odds scores, not log-odds scores)
max -
the maximum motif score of any position in the sequence
sum -
the sum of the motif (odds) scores over the sequence
totalhits -
the total number of positions in the sequence whose motif score p-value is less than the --pvalue-threshold (see below).
The totalhits scoring method is used.
--bgformat0|1|2 Select the background source.
0 - uniform background
1 - motif file
2 - background file (see the --bgfile option)
The background specified in the motif file is used.
--bgfilebfile Use the 0-order frequencies in the specified file in Markov Background Model format as the background model for converting the position-specific probability matrix (PSPM) motifs into position-specific log-odds scoring matrices (PSSM). This option overrides the --bgformat option. Use frequencies based on the motif file or files. See also --bgformat
--pvalue-thresholdpvalue Threshold to consider single motif hit significant; only valid with totalhits scoring method. A p-value threshold of 0.0002 is used.
--length-correction  Correct for length bias by making the hit p-value threshold more stringent for longer sequences. Note: This option is only valid with the totalhits scoring method, which is the default scoring method. The same p-value threshold (set with option --pvalue-threshold) is used for determining hits regardless of sequence length.
--pvalue-report-thresholdpvalue Corrected p-value threshold for reporting a motif as significantly enriched. The threshold of 0.05 is used for reporting a motif.
--pwm-thresholdscore For the Fisher's exact test only. This is the minimum motif-based sequence score for a sequence to be a 'positive'. A minimum motif score of 1 is used to call a sequence a 'positive'.
--fasta-thresholdp-value For the Fisher's exact test only when --poslist is used. This is the maximum FASTA score to call a sequence a 'positive'. A maximum FASTA score of 0.001 is used to call a sequence a 'positive'.
--fix-partitionnum Number of positive sequences; the remainder are the negative sequences. If no control set is provided, partition maximization is performed over the (sorted) input sequences. The partition with the lowest p-value is used.
--rsmethodbetter|quick Select how to compute the Wilcoxon rank-sum test:
better -
compute the proper test (slow)
quick -
use faster heuristic version of test
Use the heuristic version of the test (quick).
--poslistpwm|fasta For partition maximization, test thresholds on either X (motif) or Y (FASTA score). Only applies for partition maximization and for the Ranksum test.
pwm -
Use motif score (X).
fasta -
Use FASTA score (Y).
Hint: Be careful when switching the poslist. It switches between using X and Y for determining true positives in the contingency matrix, in addition to switching which of X and Y is used for partition maximization.
Use the FASTA score.
--log-fscores  For linear regression and Spearman tests only: regress using ln(FASTA score), rather than the score directly. Use the score directly.
--log-pwmscores  For linear regression and Spearman tests only: regress using ln(pwm score), rather than the score directly. Use the score directly.
--normalise-affinity  Normalise motif scores so that motif scores can be compared directly. Only relevant for Spearman and Linear Regression tests, where p-values are not calculated.
--linreg-switchxy  Make the x-points FASTA scores and the y-points motif scores. Only relevant for Spearman and Linear Regression tests. Keep the original axis.
--verbose1|2|3|4|5 A number that regulates the verbosity level of the output information messages. If set to 1 (quiet) then it will only output error messages whereas the other extreme 5 (dump) outputs lots of mostly useless information. This option is best placed first. At verbosity level 3, AME will report the significance of each set of each partition of the sequences that it considers. The verbosity level is set to 2 (normal).
--help  Print a usage message and exit. Run as normal.
--version Display the version and exit. Run as normal.

Citing

If you use AME in your research, please cite the following paper:
Robert McLeay and Timothy L. Bailey, "Motif Enrichment Analysis: A unified framework and method evaluation", BMC Bioinformatics, 11:165, 2010, doi:10.1186/1471-2105-11-165. [full text]