These priors allow the user to bias the search for motifs by MEME. They give a position-specific prior distribution on the location of motif sites in sequence(s) in the input dataset.
The MEME Position Specific Priors (PSP) format includes the name of the sequence for which a prior distribution corresponds. Sequences not named in the PSP file are given uniform prior distributions on site locations by MEME.
A PSP must be created for a specific width of motif. This width must be specified for each entry in the PSP file, and must be the same for all entries. If MEME varies the motif width during computation, MEME renormalizes the PSP for each sequence. For motif widths larger than that of the prior, the renormalized prior of a site is the geometric mean of the original priors of all sites that it completely contains, normalized to sum to Pr(site). For motif widths smaller than that of the prior, the renormalized prior of a site is just the original prior, normalized to sum to Pr(>0 sites). MEME computes Pr(>0 sites) for a sequence is as the sum of the priors of all potential sites in the sequence and is 1 for the OOPS model, and ≤ 1 for the ZOOPS model.
MEME PSP format is similar to FASTA format. Each entry should start with a header line consisting of a sequence name (ID) followed by the width (WIDTH) of the PSP prior. The sequence name must match the name of a sequence in the FASTA file input to MEME. Any other text on the header line after the name and width is ignored by MEME.
The following lines (PRIORS) contain one number for each position in
the identically-named FASTA sequence, where the number gives the prior
probability of a motif site at that position in the sequence (or in the
reverse complement if -revcomp is specified).
All numbers for a PRIORS entry must be in the range (0,1] except
for the last w - 1
numbers, which should be 0 (shown in
blue in the example), since a motif of that width cannot start in those
positions. The numbers in a PRIORS entry must sum to a number no greater than 1.
If they sum to less than 1 and MEME is run with the -mod oops
switch,
MEME will rescale the numbers so that they sum to 1. With the -mod zoops>
switch, MEME does not rescale the numbers to sum to 1; a very small sum represents
a prior belief that the sequence contains no motif sites. The -mod anr
switch to MEME currently does not allow PSPs.
The format of an entry in a PSP file is:
>ID WIDTH PRIORS
An example of the PSP format is given below:
>ICYA_MANSE 4 0.075922 0.070764 0.082380 0.030292 0.025101 0.043139 0.032963 0.086047 0.057445 0.000000 0.000000 0.000000 >LACB_BOVIN 4 0.107099 0.099822 0.116208 0.042731 0.035408 0.060854 0.046499 0.000000 0.000000 0.000000
The psp-gen tool can be used to generate position specific priors when supplied with two datasets of sequences: "primary" and "control".
Custom priors can also be built by hand, to, for example, favor including certain protein letters in motifs. To maximize their effect when used with MEME, a few points should be kept in mind: