jaspar2meme

Usage:

jaspar2meme [options] <file_name | directory_name>

Description

Convert a file of motifs in JASPAR 2014 or 2016 PFM format or a directory of JASPAR files in one of the three old JASPAR formats (SITES, PFM or CM) into a MEME motif suitable for use with MEME Suite programs.

Inputs

file_name

The file contains motifs in JASPAR 2014 or 2016 PFM format. In these formats, each motif is preceded by a header line that begins with '>' and is followed by a unique identfier (e.g., 'MA0001.1'). An optional second identifier can follow the first (e.g., 'SEP4'). JASPAR 2016 PFM format includes the letter at the beginning of each line and square brackets around the line of counts.

directory_name

A directory containing one or more JASPAR motif files in one of the following three formats.

JASPAR PFM Format

This format describes a motif in terms of a count matrix where the rows correspond to A, C, G and T respectively. The JASPAR count file names are expected to end with the .pfm extension.

JASPAR Sites Format

This format describes a motif in terms of a multiple alignment of sites. It contains a multiple alignment in modified FASTA format. Only capitalized sequence letters are part of the alignment. The sites formatted file names are expected to end with the .sites extension.

CM Format

This format describes a motif in terms of a count matrix with each row preceeded by the letters A|, C|, G| and T|. The CM count file names are expected to end with the .cm extension.

Outputs

Writes MEME motif format to standard output.

A probability matrix and optionally a log-odds matrix are output for each motif in the file. The probability matrix is computed using pseudo-counts consisting of the background frequency (see -bg, below) multiplied by the total pseudocounts (see -pseudo, below). The log-odds matrix uses the background frequencies in the denominator and is log base 2.

Options

Option Parameter Description Default Behaviour
General Options
-bundle Read motifs in JASPAR 2014 or JASPAR 2016 PFM format from the file named file_name. The lines may be in any order but the MEME matrices will be output with the lines in the standard order (e.g., ACGT for DNA). Read JASPAR SITES files (.sites) from directory_name.
-pfm Read JASPAR PFM files (.pfm) from directory_name. Read JASPAR SITES files (.sites) from directory_name.
-cm Read JASPAR CM files (.cm) with line labels A| etc. from directory directory_name. Read JASPAR SITES files (.sites) from directory_name.
-strands1|2 Specify if a single strand or both strands were considered to create the motif. Defaults to reporting that both strands were scanned.
-numbersUse a number based on the position in the input instead of the JASPAR ID as the motif identifier.The JASPAR ID is used as the motif identifier.
-bgbackground fileThe background file should be a Markov background model. It contains the background frequencies of letters use for assigning pseudocounts. The background frequencies will be included in the resulting MEME file.Uses uniform background frequencies.
-pseudototal pseudocountsAdd total pseudocounts times letter background to each frequency.No pseudocount is added.
-logoddsInclude a log-odds matrix in the output. This is not required for versions of the MEME Suite ≥ 4.7.0.The log-odds matrix is not included in the output.
-urlwebsiteThe provided website URL will be stored with the motif and this can be used by MEME Suite programs to provide a direct link to that information in their output. If website contains the keyword MOTIF_NAME the motif name is substituted in place of MOTIF_NAME in the output.
For example if the url is
http://big-box-of-motifs.com/motifs/MOTIF_NAME.html
and the motif name is MA0024, the motif will contain a link to
http://big-box-of-motifs.com/motifs/MA0024.html
The output does not include a URL with the motifs.

Example Input Formats

New PFM Format Example

>MA0002.1 RUNX1
10   12    4    1    2    2    0    0    0    8   13
 2    2    7    1    0    8    0    0    1    2    2
 3    1    1    0   23    0   26   26    0    0    4
11   11   14   24    1   16    0    0   25   16    7
      

Old PFM Format Example

 0  3 79 40 66 48 65 11 65  0
94 75  4  3  1  2  5  2  3  3
 1  0  3  4  1  0  5  3 28 88
 2 19 11 50 29 47 22 81  1  6
      

SITES Format Example

>MA0024 E2F     1
aTTTGGCGC
>MA0024 E2F     2
TTTGGCGC
>MA0024 E2F     3
TTTGGCGC
>MA0024 E2F     4
TTTGGCGC
>MA0024 E2F     5
TTTCGCGC
>MA0024 E2F     6
TTTCGCGC
>MA0024 E2F     7
TTTCGCGC
>MA0024 E2F     8
TTTGCCGC
>MA0024 E2F     9
TTTCCCGC
>MA0024 E2F     10
TTTGGCGG
      

CM Format Example

A|  0  3 79 40 66 48 65 11 65  0
C| 94 75  4  3  1  2  5  2  3  3
G|  1  0  3  4  1  0  5  3 28 88
T|  2 19 11 50 29 47 22 81  1  6