Usage:

gomo [options] <go-term database> <scoring file>+

Description

Input

GO Term Database

A collection of GO terms mapped to to the sequences in the scoring file. Database are provided by the webservices and are formated using a simple tab separated values (tsv) format:

"GO-term" "Sequence identifiers separated by tabs"

The exception to this rule is the first line which instead contains the URL to an on-line database (if any) containing entries for the gene IDs. The URL should have ampersands (&) replaced with &amp; and the place for the gene ID marked by the token !!GENEID!!. Each gene ID reported in GOMo's output will be linked to the URL with the actual gene ID inserted.

Scoring File

An XML file which contains for each motif the sequences and their score. The XML file uses the CisML schema. When scoring data is available for multiple related species GOMo can take multiple scoring files where the true sequence identifiers have been mapped to their orthologs in the reference species for which the go-term database was supplied.

Scoring files may easily be created using the AMA utility that is part of the downloadable MEME Suite. A typical command to create a scoring file named "ama_out/ama.xml" using AMA would be:

	    ama ama_out -pvalues <motif_file> <fasta_sequence_file> <background_file>
          

By default GOMo uses the p-value given for each gene in the CisML file to rank the genes. Any sequence failing to provide a p-value will cause GOMo to exit. The --gs switch causes GOMo to use the gene scores from the CisML file instead for ranking genes.

Output

GOMo will create a directory, named gomo_out by default. Any existing output files in the directory will be overwritten. The directory will contain:

The default output directory can be overridden using the --o or --oc options which are described below.

Additionally the user can override the creation of files altogether by specifying the --text option which causes GOMo to output its tab-separated values format to standard output. The tab-separated values format contains the following columns:

	"Motif Identifier" "GO Term Identifier" "GOMo Score" "p-value" "q-value"
        

Options:

OptionParameterDescriptionDefault Behaviour
General Options
--text  Output in tab separated values format to standard output. Will not create an output directory or files.
--motifsmotifs Path to the optional motif file in MEME Motif Format that was used to generate (all of the) scoring file(s). The motifs in this file will be used to generate sequence logos in the GOMo HTML output. No logos are displayed in the HTML output.
--daggodag Path to the optional Gene Ontology DAG file to be used for identifying the most specific terms in the GOMo xml output so they can be highlighted in the HTML output.
--motifid Use only the motif identified by id. This option may be repeated. All motifs are used.
--shuffle_scores n Generate empirical null by shuffling the sequence-to-score assignments n times. Use the resulting distribution to compute empirical p-values. Shuffle 1000 times.
--t q Threshold used on the score q-values for reporting results. To show all results use a value of 1.0. A threshold of 0.05 is used.
--gs  Use the scores contained in the CisML file for ranking genes. Any sequence failing to provide a score will cause GOMo to exit. Use the p-values contained in the CisML file for ranking genes.
--score_E_thresh E All genes with E-values in the CisML file larger than E are treated as having the maximum possible score (and as having tied worst rank when the genes are sorted for the rank-sum test). The E-values are computed by multiplying the p-values by the number of genes in the CisML file. Setting E to a number less than 1 can reduce the effect of noise. The threshold will be ignored when GOMo is told to use gene scores rather than p-values via the --gs switch. E-values are not thresholded when ranking genes.
--min_gene_count n Only consider GO terms annotated with a at least n genes. A value of 1 is used, which shows all results.
--nostatus  Suppresses the progress information.
--version Display the version and exit. Run as normal.

Citing

If you use GOMo in your research, please cite the following paper:
Fabian A. Buske, Mikael Bodén, Denis C. Bauer and Timothy L. Bailey, "Assigning roles to DNA regulatory motifs using comparative genomics", Bioinformatics, 26(7), 860-866, 2010. [full text]