SpaMo

<sequences>

The name of a FASTA formatted file containing sequences (ideally of about 500bp) centered on a genomic location expected to be relevant to the primary motif. This would typically be generated by expanding either side of a ChIP-seq peak to obtain sequences of about 500 bases in length.

SpaMo scans the central section, excluding the margin on either edge, for the primary motif. As the margin on each edge is excluded then if the sequence is shorter than two times the margin plus the trimmed length of the primary motif the sequence will always be discarded.

<primary motif>

The name of a file containing at least one MEME formatted motif. Outputs from MEME and DREME are supported, as well as Minimal MEME Format. You can convert many other motif formats to MEME format using conversion scripts available with the MEME Suite. The primary motif is the motif for which you are trying to find cofactors. If the file contains more than one motif then the first will be selected by default or another can be selected using the -primary or -primaryi options.

<secondary motifs>

The names of one or more MEME formatted motif files containing DNA motifs (see Primary Motifs, above). The secondary motifs are tested for a significant spacing with the primary motif which might imply they act together. If the motif databases contain motifs which you don't wish to scan, the motifs can be filtered based on their name by using the -inc and -exc options.

SpaMo outputs its output to files in a directory named spamo_out, which it creates if necessary. You can change the output directory using the -o or -oc options.

The main output file is an HTML file named spamo.html, and it can be viewed with a web browser. A tab-separated values (TSV) output file named spamo.tsv is also generated that contains a single line for each significant primary-secondary motif spacing. Detailed documentation on the meanings of the columns in the is provided at the bottom of the TSV file.

Additional outputs may be requested using the -dumpseqs, -dumpsigs, -eps and -png options, as described below.

Option Parameter Description Default Behaviour

Input/Output

-o dir Create a folder called dir and write output files in it. This option is not compatible with -oc as only one output folder is allowed. The program behaves as if -oc spamo_out had been specified.

-oc dir Create a folder called dir but if it already exists allow overwriting the contents. This option is not compatible with -o as only one output folder is allowed. The program behaves as if -oc spamo_out had been specified.

-eps Output histograms in Encapsulated PostScript format which can be included in publications. This option can be used with the -png option. Image files are not output by default as the webpage is capable of generating the graphs on demand.

-png Output histograms in Portable Network Graphic format which is good for webpages. This option can be used with the -eps option. Image files are not output by default as the webpage is capable of generating the graphs on demand.

-dumpseqs

Write space separated values in columns, describing the motif matches used to make the histograms, to output files named seqs_<primary_motif>_<secondary_db>_<secondary_motif>.txt. The rows are initially in sequence name order but various command-line tools can be used to sort them on other values. The columns contain:

column(s)	contents
1	Trimmed lowercase sequence with uppercase matches
2	Position of the secondary match within the whole sequence
3	Sequence fragment that the primary matched
4	Strand of the primary match (+/-)
5	Sequence fragment that the secondary matched
6	Strand of the secondary match (+/-)
7	Is the primary match on the same strand as the secondary (s/o)
8	Is the secondary match downstream or upstream (d/u)
9	The gap between the primary and secondary matches
10	The name of the sequence
11	The p-value of the bin containing the match, adjusted for the number of bins
If the sequence names are in Genome Browser position format (e.g., "chr5:36715616-36715623"), the following additional columns appear:
12-14	Position of primary match in BED coordinates
15	Position of primary match in Genome Browser coordinates
16-18	Position of secondary match in BED coordinates
19	Position of secondary match in Genome Browser coordinates

No specific match information is output.

-dumpsigs Same as -dumpseqs, but only secondary matches in significant bins are dumped. As in -dumpseqs.

Scanning

-numgen seed Specify a number as the seed for initializing the pseudo-random number generator used in breaking scoring ties. The seed is included in the output so experiments can be repeated. If you wish to run multiple experiments with different seeds then you can use the special value 'time' (without the quotes) which sets the seed to the system clock. A seed of 1 is used.

-margin size The distance either side of the primary motif site which makes up the region that can contain the secondary motif site. Additionally it is the minimum gap between the primary motif site and the edge of the sequence. These constraints mean that input sequences shorter than the trimmed length of the primary motif plus two times the margin size can not be used by SpaMo. A margin of 150 is used. For an input sequence of length 500 this means the central 200 bases are scanned for the best primary motif match and then the 300 bases surrounding the best primary site are scanned for the best secondary site.

-minscore value The minimum score accepted as a match to either the primary or secondary motif. This value can greatly affect the results of SpaMo. If it is too high, there will be no matches to the primary motif. If too low, sequences with non-significant matches to the primary and/or secondary motif will reduce the effectiveness of the spacing analysis. Note: If value is in the range [-1,0) then the minimum score is set to the absolute value of value times the maximum possible match score. A minimum score of 7 bits is used.

-bin size The size of the bin used to calculate the histogram and p-values. A bin size of 1 is recommended as it gives better output. A bin size of 1 is used.

-range size The distance from the primary motif site for which p-values are calculated to include in significance tests. A small value for range may miss significant peaks but this is a trade-off as a the larger the range the more bins have to be tested leading to a larger factor used in the Bonferroni correction for multiple tests. A range of 150 is used.

-shared fraction Redundant sequences are removed that have more than this fraction of identical residues. After the primary motif site has been selected in each sequence the sequence is trimmed to only include a region of size margin on either side of the primary motif site. This aligned and trimmed sequence (and its reverse complement) is then compared with all the other sequences and the fraction of shared bases is calculated, not including the bases in the match to the primary motif. If the fraction of shared bases between the sequence (or its reverse complement) is larger than this limit, then the second sequences is eliminated. To disable this feature set the shared fraction to 1. The shared fraction is set to 0.5 which means that the trimmed, aligned sequences must share 50% or more of their bases to be declared redundant.

-odds odds ratio To speed up the elimination of redundant sequences their positions are compared in a random order and comparison stops whenever the number of matches is so small that the odds ratio is greater than this value. The odds ratio is the probability of the given number of matches given that the sequences were generated by the background model, divided by the same probability given they have at least fraction matching positions (as specified by the option -shared). The odds ratio is set to 20.

Summarizing

-cutoff p-value The p-value cutoff for bins to be considered significant. This is the p-value of the Binomial Test on the number of observed secondary spacings or more falling into the given bin, adjusted for the number of bins tested. Note that the p-value is only calculated and tested for bins within the distance of the primary motif as specified by the option -range. A bin p-value smaller than or equal to 0.05 is considered significant.

-evalue E-value The minimum secondary motif E-value for its results to be printed. For each secondary motif, this is the minimum p-value of all tested bins multipled by the number of secondary motifs. The E-value estimates the expected number of random secondary motifs that would have the given E-value or lower. Results for all secondary motifs with E-value smaller than or equal to 10 are printed.

-overlap size To determine if two motifs are redundant the most significant bin in the tested range for each of the motifs is compared. For the motifs to be considered redundant it needs to be possible that the sites that got counted in the bin could have overlapped, and this parameter sets the minimum overlap. For a bin size larger than 1 the overlap of the bins can not be precisely calculated as the actual site positions are not stored and so the maximum possible overlap is used. A minimum overlap of 2 is required.

-joint fraction To determine if two motifs are redundant the most significant bin in the tested range in each of the motifs is compared. The most significant bin in each motif has the list of sequence identifiers which had a primary and secondary at the correct spacing to go into that bin. To compare the motifs for redundancy this set of sequence identifiers is compared and the size of the intersection is counted. This intersection size is divided by the size of the smaller of the two sequence sets to get the joint sequence fraction. A minimum joint sequence fraction of 0.5 is required for two motifs to be considered redundant.

Motif Loading

-pseudo count The pseudocount added to loaded motifs. A pseudocount of 0.1 is added to loaded motifs.

-bgfile file Specify the source of a background model for converting a frequency matrix to a log-odds score matrix and for use in estimating the p-values of match scores. The value of file is either the path to a file in Markov Background Model Format, or one of the keywords motif-file, --motif-- or --uniform--. The first two keywords cause the 0-order letter frequencies contained in the primary motif file to be used, and --uniform-- causes uniform letter frequencies to be used. The frequencies of the letters in the sequences are used as the background model.

-xalph Convert the alphabet of the secondary motif database(s) to the alphabet of the primary motif assuming the core symbols of the secondary motif alphabet are a subset. The input sequences must be in the alphabet of the primary motif. The primary motif alphabet must be identical to the secondary motif alphabet and to that of the input sequences.

-trim bits Trim the edges of motifs based on the information content. The positions on the edges of the motifs with information content less than bits will not be used in scanning. Positions on the edges of the motifs with information content less than or equal to 0.25 will be trimmed.

-primary name The name of the motif to select as the primary motif. This option is incompatible with -primaryi as only one primary motif can be selected. The first motif in the file is selected.

-primaryi num The index of the motif to select as the primary motif counting from 1. This option is incompatible with -primary as only one primary motif can be selected. The first motif in the file is selected.

-keepprimary If the same file is specified for the primary and secondary motifs then by default the primary motif is excluded but specifying this option keeps it. The primary motif is excluded from the secondaries if the same file is used for the primary and secondary motifs.

-inc pattern Select the motifs with names matching the pattern. The pattern can contain shell like wildcards (e.g., '*') though they must be escaped or quoted to prevent the shell from auto-expanding them. This option may be may be repeated and all the patterns will be used. Unless the -exc option has been specified all the motifs are used.

-exc pattern Exclude the motifs with names matching the pattern. The pattern can contain shell like wildcards (e.g., '*') though they must be escaped or quoted to prevent the shell from auto-expanding them. This option may be may be repeated and all the patterns will be used. Unless the -inc option has been specified all the motifs are used.

Miscellaneous

-help Print out a help message.

-verbosity 1|2|3|4|5 A number that regulates the verbosity level of the output information messages. If set to 1 (quiet) then it will only output error messages whereas the other extreme 5 (dump) outputs lots of mostly useless information. The verbosity level is set to 2 (normal).

-version Display the version and exit. Run as normal.

The MEME Suite

Motif-based sequence analysis tools

Spaced Motif Analysis Tool

Usage:

Description

Inputs

Outputs

Options

Citing