taipale2meme

Usage:

taipale2meme [options]

Description

Convert a file containing the tab separated columns exported from the spreadsheet of Taipale results into MEME motif format.

Input

Reads a Taipale file from standard input.

A Taipale file describes one or more motifs in terms of a probability matrix in column orientation.

The Taipale file must be exported from the Excel spreadsheet using the following steps:

  1. Open the xls file in OpenOffice (presumably Excel will work the same)
  2. Select "Save As" from the File menu
  3. Change the file name to "sheetX" where X is the number of the sheet, set the output type as "Text CSV (.csv)", press save
  4. Leave the character set as UTF-8, change the field delimiter to {Tab} and leave the text delimeter as "
  5. You should now have a file called sheetX.csv
  6. Close OpenOffice
  7. Repeat from step 1 as needed for other sheets

See the example Taipale file.

Output

Writes MEME motif format to standard output.

A probability matrix and optionally a log-odds matrix are output for each motif in the file. The probability matrix is computed using pseudo-counts consisting of the background frequency (see -bg, below) multiplied by the total pseudocounts (see -pseudo, below). The log-odds matrix uses the background frequencies in the denominator and is log base 2.

Options

Option Parameter Description Default Behaviour
General Options
-nccolumn Repeat this option to specify which file columns should be used to create the motif names by joining with "_". The first non-empty column is used as the motif name
-occolumn Repeat this option to specify one or more columns that should be empty in the line of information that identifies the motif. If the column is not empty then the motif is skipped. All motifs are kept.
-postfixappended text Specify text to append to the motif names. Motif names are left unchanged.
-strands1|2 Specify if the PWM was generated from single or double stranded datasets. Assumes the PWM was generated from a double stranded dataset.
-bgbackground fileThe background file should be a Markov background model. It contains the background frequencies of letters use for assigning pseudocounts. The background frequencies will be included in the resulting MEME file.Uses uniform background frequencies.
-pseudototal pseudocountsAdd total pseudocounts times letter background to each frequency.No pseudocount is added.
-logoddsInclude a log-odds matrix in the output. This is not required for versions of the MEME Suite ≥ 4.7.0.The log-odds matrix is not included in the output.
-urlwebsiteThe provided website URL will be stored with the motif and this can be used by MEME Suite programs to provide a direct link to that information in their output. If website contains the keyword MOTIF_NAME the TAMO ID is substituted in place of MOTIF_NAME in the output.
For example if the url is
http://big-box-of-motifs.com/motifs/MOTIF_NAME.html
and the TAMO ID is tamo_id, the motif will contain a link to
http://big-box-of-motifs.com/motifs/tamo_id.html
The output does not include a URL with the motifs.

Example Taipale file

"TABLE S1"											
"MEASURED BINDING PROFILES (DATA FOR FIG1D)"											
"GLI1"	1	2	3	4	5	6	7	8	9	10	14
"A"	0.010	0.855	0.021	0.000	0.855	0.000	0.124	0.040	0.982	0.287	0.033
"C"	0.001	0.069	0.972	1.000	0.069	1.000	0.876	0.914	0.006	0.518	0.068
"G"	0.973	0.076	0.000	0.000	0.015	0.000	0.000	0.008	0.004	0.097	0.792
"T"	0.016	0.000	0.006	0.000	0.061	0.000	0.000	0.038	0.008	0.098	0.107
											
"GLI2"	1	2	3	4	5	6	7	8	9	10	14
"A"	0.000	0.822	0.041	0.000	0.839	0.000	0.098	0.013	0.997	0.383	0.040
"C"	0.000	0.090	0.959	1.000	0.064	1.000	0.902	0.887	0.000	0.361	0.156
"G"	0.981	0.088	0.000	0.000	0.010	0.000	0.000	0.005	0.000	0.121	0.713
"T"	0.018	0.000	0.000	0.000	0.087	0.000	0.000	0.095	0.003	0.135	0.091
											
"GLI3"	1	2	3	4	5	6	7	8	9	10	14
"A"	0.004	0.785	0.034	0.000	0.767	0.000	0.147	0.054	0.937	0.307	0.052
"C"	0.000	0.097	0.966	1.000	0.106	1.000	0.853	0.846	0.031	0.359	0.133
"G"	0.969	0.119	0.000	0.000	0.024	0.000	0.000	0.013	0.004	0.181	0.652
"T"	0.027	0.000	0.000	0.000	0.104	0.000	0.000	0.088	0.027	0.154	0.163
											
											
"Ci"	1	2	3	4	5	6	7	8	9	10	14
"A"	0.010	0.903	0.012	0.000	0.899	0.000	0.074	0.020	0.968	0.414	0.029
"C"	0.001	0.048	0.985	1.000	0.058	1.000	0.926	0.951	0.013	0.455	0.047
"G"	0.980	0.049	0.000	0.000	0.008	0.000	0.000	0.007	0.006	0.066	0.860
"T"	0.009	0.000	0.003	0.000	0.035	0.000	0.000	0.022	0.013	0.064	0.064
											
											
"Tcf4"	1	2	3	4	5	6	7	8	9		
"A"	0.197	0.036	0.056	0.024	0.058	0.048	0.876	0.194	0.097		
"C"	0.337	0.662	0.103	0.032	0.004	0.165	0.000	0.000	0.366		
"G"	0.228	0.143	0.003	0.029	0.010	0.675	0.022	0.017	0.481		
"T"	0.238	0.159	0.837	0.915	0.929	0.112	0.099	0.782	0.056		
											
											
"cETS1"	1	2	3	4	5	6	7	8	9		
"A"	0.326	0.108	0.002	0.000	0.975	0.375	0.647	0.141	0.315		
"C"	0.345	0.842	0.000	0.002	0.003	0.078	0.046	0.354	0.300		
"G"	0.209	0.044	0.993	0.992	0.003	0.014	0.290	0.136	0.145		
"T"	0.120	0.006	0.005	0.004	0.019	0.533	0.018	0.369	0.240		
											
"Known mismatches with activity in vivo (GLI, Tcf4) are indicated by boxes, the degenerate high affinity sequence derived from semiquantitative SELEX analysis (c-ETS1) is indicated by green backround. References:"											
	"GENE"	"Comment"				"Reference"					
"GLI1, 2, 3"	"FOXA2"	"FOXA2 requires high amounts of Shh to be induced"				"Sasaki H, Hui C, Nakafuku M, Kondoh H. A binding site for Gli proteins is essential for HNF-3beta floor plate enhancer activity in transgenics and can respond to Shh in vitro. Development 12:1313-1322, 1997."					
"Tcf4"	"CDX1"	"two binding sites mutated"				"Jho EH, Zhang T, Domon C, Joo CK, Freund JN, Costantini F. Wnt/beta-catenin/Tcf signaling induces the transcription of Axin2, a negative regulator of the signaling pathway. Mol Cell Biol. 22:1172-1183, 2002."					
"c-ETS1"	"-"	"Consensus from SELEX: (A)CMGGAWRTT"				"Woods DB, Ghysdael J, Owen MJ. Identification of nucleotide preferences in DNA sequences recognised specifically by c-Ets-1 protein. Nucleic Acids Res. 20:699-704, 1992. "