AME

The command-line version of AME supports a wide variety of methods for scoring motif enrichment and many methods of testing the scored motif enrichment for significance. If a set of control sequences is provided, AME determines if each motif is enriched in the primary set compared to the control set. If no control sequence set is given, the primary sequences are assumed to be sorted in decreasing order of 'importance' according to some secondary criterion. For each motif, AME determines if the sequences at the top of the list are significantly enriched for matches to it. With no control sequences, by default AME performs unconstrained partition maximization, looking for the pair of thresholds (on motif score and the secondary criterion, respectively) that yields the most significant result.

Input

Sequence File

A set of (primary) sequences in FASTA format. The sequences must be sorted by increasing value of some secondary criterion (e.g., expression level, peak height, fluorescence score). In this documentation, we refer to this secondary criterion as the "FASTA score". This score can optionally be placed in the FASTA ID line. If present, the FASTA score must come immediately after the sequence ID. For example, if the FASTA ID line is

>seq_1 0.123

then 0.123 is the FASTA score for that sequence.

Motif File

File(s) containing MEME formatted motifs. Outputs from MEME and DREME are supported, as well as Minimal MEME Format. You can convert many other motif formats to MEME format using conversion scripts available with the MEME Suite.

Output

AME writes to a directory, ame_out, unless a different directory name is specified on the command line. The output directory contains outputs in two formats: HTML and plain text, in files named respectively ame.html and ame.txt. The text output includes a consensus sequence computed for each significant motif.

Options

Option	Parameter	Description	Default Behaviour
General Options
--o	dir	Create a folder called dir and write output files in it. This option is not compatible with --oc as only one output folder is allowed.	The program behaves as if `--oc ame_out` had been specified.
--oc	dir	Create a folder called dir but if it already exists allow overwriting the contents. This option is not compatible with --o as only one output folder is allowed.	The program behaves as if `--oc ame_out` had been specified.
--control	file	A set of control sequences in FASTA format. If a set of control sequences is provided, AME determines if each motif is enriched in the primary set compared to the control set. With this option, the order of the sequences in the primary and control sets does not affect the results. Depending on the biological application, the control sequences might be, for example, randomly chosen genome regions or shuffled sequences (the utility fasta-shuffle-letters can be used to create these from the primary sequences). Note: The control sequences should have (approximately) the same distribution of lengths as the primary sequences or AME may fail to correctly detect enriched motifs and will report inaccurate p-values.	If no control sequences are provided, AME searches for position N in the list of primary sequences (a "partition") such that the given motif is maximally enriched in the first N sequences relative to the remaining primary sequences. Consequently, without a control set of sequences, the order of the sequences in the primary set matters.
--method	fisher\|ranksum\|mhg\|4dmhg\|spearman\|linreg	Select the association function for testing motif enrichment significance. `fisher` - one-tailed Fisher's Exact test `ranksum` - one-tailed Wilcoxon rank-sum test, also known as the Mann-Whitney U test `mhg` and `4dmhg` - two-tailed tests described in McLeay and Bailey, "Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data", BMC Bioinformatics 11:165, 2010. Note: Motifs enriched in either the primary or control sequences (or at the top or bottom of the primary sequences if only one sequence file is given) are considered significant by these tests. `spearman` - Spearman's rank coefficient (ρ) between the motif score and the FASTA score (in the FASTA IDs of the sequences). Not valid with `--control`. Not valid with `--fix-partition`; always uses all input sequences. `linreg` - mean-squared error of the linear regression of the motif score and the FASTA score (in the FASTA IDs of the sequences). Note that the FASTA ID lines must each contain a FASTA score with this function. Not valid with `--control`. Using this function, AME reports raw values only. For the full approach used in the paper, please use the tool RAMEN, which is also included in the MEME Suite. RAMEN also supports estimation of p-values. All statistical tests except `mhg` and `4dmhg` use the alternative hypothesis that high motif scores are enriched in the primary sequences (or among the first sequences if only one set is given). Note that linear regression and Spearman rank correlation tests do not calculate p-values. Please use RAMEN if you desire to use linear regression with p-values.	The Fisher's Exact test (`fisher`) is used for testing motif enrichment.
--scoring	avg\|max\|sum\|totalhits	Method of scoring a single sequence for matches to a motif. The score assigned to a sequence is either: `avg` - the average motif score of the sequence (note: motif scores are odds scores, not log-odds scores) `max` - the maximum motif score of any position in the sequence `sum` - the sum of the motif (odds) scores over the sequence `totalhits` - the total number of positions in the sequence whose motif score p-value is less than the `--pvalue-threshold` (see below).	The `totalhits` scoring method is used.
--xalph		If the input motifs are in a different alphabet than the input sequences, and the motif alphabet is a subset of the sequence alphabet, you can specify an alphabet file containing the sequence alphabet defintion. The input motifs will be converted to this new alphabet, with the probabilities for the new symbols set to zero prior to applying pseudocounts.	Motifs retain the alphabet defined in the motif file.
--bgformat	0\|1\|2	Select the background source. 0 - uniform background 1 - motif file 2 - background file (see the --bgfile option)	The background specified in the motif file is used.
--bgfile	bfile	Use the 0-order frequencies in the specified file in Markov Background Model format as the background model for converting the position-specific probability matrix (PSPM) motifs into position-specific log-odds scoring matrices (PSSM). This option overrides the --bgformat option.	Use frequencies based on the motif file or files. See also --bgformat
--pvalue-threshold	pvalue	Threshold to consider single motif hit significant; only valid with `totalhits` scoring method.	A p-value threshold of 0.0002 is used.
--length-correction		Correct for length bias by making the hit p-value threshold more stringent for longer sequences. Note: This option is only valid with the `totalhits` scoring method, which is the default scoring method.	The same p-value threshold (set with option --pvalue-threshold) is used for determining hits regardless of sequence length.
--pvalue-report-threshold	pvalue	Corrected p-value threshold for reporting a motif as significantly enriched.	The threshold of 0.05 is used for reporting a motif.
--pwm-threshold	score	For the Fisher's exact test only. This is the minimum motif-based sequence score for a sequence to be a 'positive'.	A minimum motif score of 1 is used to call a sequence a 'positive'.
--fasta-threshold	p-value	For the Fisher's exact test only when `--poslist` is used. This is the maximum FASTA score to call a sequence a 'positive'.	A maximum FASTA score of 0.001 is used to call a sequence a 'positive'.
--fix-partition	num	Number of positive sequences; the remainder are the negative sequences.	If no control set is provided, partition maximization is performed over the (sorted) input sequences. The partition with the lowest p-value is used.
--rsmethod	better\|quick	Select how to compute the Wilcoxon rank-sum test: `better` - compute the proper test (slow) `quick` - use faster heuristic version of test	Use the heuristic version of the test (`quick`).
--poslist	pwm\|fasta	For partition maximization, test thresholds on either X (motif) or Y (FASTA score). Only applies for partition maximization and for the Ranksum test. `pwm` - Use motif score (X). `fasta` - Use FASTA score (Y). Hint: Be careful when switching the `poslist`. It switches between using X and Y for determining true positives in the contingency matrix, in addition to switching which of X and Y is used for partition maximization.	Use the FASTA score.
--log-fscores		For linear regression and Spearman tests only: regress using ln(FASTA score), rather than the score directly.	Use the score directly.
--log-pwmscores		For linear regression and Spearman tests only: regress using ln(pwm score), rather than the score directly.	Use the score directly.
--normalise-affinity		Normalise motif scores so that motif scores can be compared directly. Only relevant for Spearman and Linear Regression tests, where p-values are not calculated.
--linreg-switchxy		Make the x-points FASTA scores and the y-points motif scores. Only relevant for Spearman and Linear Regression tests.	Keep the original axis.
--verbose	1\|2\|3\|4\|5	A number that regulates the verbosity level of the output information messages. If set to 1 (quiet) then it will only output error messages whereas the other extreme 5 (dump) outputs lots of mostly useless information. This option is best placed first. At verbosity level 3, AME will report the significance of each set of each partition of the sequences that it considers.	The verbosity level is set to 2 (normal).
--help		Print a usage message and exit.	Run as normal.
--version		Display the version and exit.	Run as normal.

The MEME Suite

Motif-based sequence analysis tools

Analysis of Motif Enrichment

Usage:

Description

Input

Sequence File

Motif File

Output

Options

Citing