gomo [options] <go-term database> <scoring file>+
A collection of GO terms mapped to to the sequences in the scoring
file. Database are provided by the webservices and are formated using a
simple tab separated values (tsv) format:
"GO-term" "Sequence identifiers separated by tabs"
The exception to this rule is the first line which instead contains the
URL to an on-line database (if any) containing entries for the gene IDs.
The URL should have ampersands (&) replaced with
&
and the place
for the gene ID marked by the token !!GENEID!!
.
Each gene ID reported
in GOMo's output will be linked to the URL with the actual gene ID inserted.
An XML file which contains for each motif the sequences and their score. The XML file uses the CisML schema. When scoring data is available for multiple related species GOMo can take multiple scoring files where the true sequence identifiers have been mapped to their orthologs in the reference species for which the go-term database was supplied.
Scoring files may easily be created using the AMA utility
that is part of the downloadable MEME Suite. A typical command to
create a scoring file named "ama_out/ama.xml
" using AMA would be:
ama ama_out -pvalues <motif_file> <fasta_sequence_file> <background_file>
By default GOMo uses the p-value given for each gene in the CisML file to rank the genes. Any sequence failing to provide a p-value will cause GOMo to exit. The --gs switch causes GOMo to use the gene scores from the CisML file instead for ranking genes.
GOMo will create a directory, named gomo_out
by default.
Any existing output files in the directory will be overwritten. The
directory will contain:
gomo.html
providing the results in a human-readable format.gomo.txt
providing the results in a tab-separated values format.gomo.xml
providing the results in a machine-readable "GOMo format".The default output directory can be overridden using the --o or --oc options which are described below.
Additionally the user can override the creation of files altogether by
specifying the --text option which causes GOMo to
output its tab-separated values format to standard output. The tab-separated
values format contains the following columns:
"Motif Identifier" "GO Term Identifier" "GOMo Score" "p-value" "q-value"
Option | Parameter | Description | Default Behaviour |
---|---|---|---|
General Options | |||
--text | Output in tab separated values format to standard output. Will not create an output directory or files. | ||
--motifs | motifs | Path to the optional motif file in MEME Motif Format that was used to generate (all of the) scoring file(s). The motifs in this file will be used to generate sequence logos in the GOMo HTML output. | No logos are displayed in the HTML output. |
--dag | godag | Path to the optional Gene Ontology DAG file to be used for identifying the most specific terms in the GOMo xml output so they can be highlighted in the HTML output. | |
--motif | id | Use only the motif identified by id. This option may be repeated. | All motifs are used. |
--shuffle_scores | n | Generate empirical null by shuffling the sequence-to-score assignments n times. Use the resulting distribution to compute empirical p-values. | Shuffle 1000 times. |
--t | q | Threshold used on the score q-values for reporting results. To show all results use a value of 1.0. | A threshold of 0.05 is used. |
--gs | Use the scores contained in the CisML file for ranking genes. Any sequence failing to provide a score will cause GOMo to exit. | Use the p-values contained in the CisML file for ranking genes. | |
--score_E_thresh | E | All genes with E-values in the CisML file larger than E are treated as having the maximum possible score (and as having tied worst rank when the genes are sorted for the rank-sum test). The E-values are computed by multiplying the p-values by the number of genes in the CisML file. Setting E to a number less than 1 can reduce the effect of noise. The threshold will be ignored when GOMo is told to use gene scores rather than p-values via the --gs switch. | E-values are not thresholded when ranking genes. |
--min_gene_count | n | Only consider GO terms annotated with a at least n genes. | A value of 1 is used, which shows all results. |
--nostatus | Suppresses the progress information. | ||
--version | Display the version and exit. | Run as normal. |
If you use GOMo in your research, please cite the following paper:
Fabian A. Buske, Mikael Bodén, Denis C. Bauer and Timothy L. Bailey,
"Assigning roles to DNA regulatory motifs using comparative genomics",
Bioinformatics, 26(7), 860-866, 2010.
[full text]