mhmm - MEME Suite

This program creates motif-based hidden Markov models (HMMs) of families of related biosequences. The program takes as input a set of DNA or protein motif models and produces as output a single HMM containing the given motifs. The program can produce three types of models: linear models, in which the motifs are arranged like beads on a string, completely connected models, which allow for repetitions of motifs and for motifs to appear in any order, and star models. Mhmm writes its output in a format readable by the other MEME Suite programs, mhmms and mhmmscan.

Three types of models may be produced. A linear motif-based HMM consists of a sequence of motif models, each separated by one or more tied insert states that represent the spacer region between motifs. A completely connected model, on the other hand, includes transitions from the end of each motif to the beginning of every other motif in the model (with a spacer model along each transition). This more general topology allows for motifs that are repeated, deleted or shuffled. A star model is also available. By default, the program produces a linear model.

Transition probabilities among motifs are derived from the motif occurence information in the given MEME file. For the completely connected topology, this information is derived from all of the motif occurences. For the linear topology, the information is derived only from the best-scoring sequence. Alternatively, the order and spacing of motifs within a linear model may be specified via the --order option.

Motif File

A file containing MEME formatted motifs. Outputs from MEME and DREME are supported along with minimal MEME format for which there are conversion scripts available to support other formats. Input motifs that are likely to appear in the sequences.

Writes a motif-based hidden Markov model in MHMM format to standard output.

Option Parameter Description Default Behaviour

Input/Output

--motif motif # This option (which may be repeated) allows the user to select a specific motif for inclusion in the HMM. The specified motif number corresponds to the motif index in the MEME file.

--nmotifs n This option is similar to the -motif option, except that it tells mhmm to use the first n motifs in the given MEME file. All motifs are included.

--ethresh ev This option sets an E-value threshold for inclusion of motifs in the model. Motifs with E-values < ev are ignored. All motifs are included.

--lowcomp

threshold

Eliminate low-complexity motifs from the model. Motif complexity is the average K-L distance between the "motif background distribution" and each column of the motif. The motif background is just the average distribution of all the columns. The K-L distance, which measures the difference between two distributions, p and f, is the same as the information content:

                p1.log(p1/f1) + 
                p2.log(p2/f2) + ... +
                pn.log(pn/fn)
              

This value increases with increasing complexity.

--order string This option instructs mhmm to build an HMM with linear topology. The given string specifies the order and spacing of the motifs within the model, and has the format "s=m=s=m=…=s=m=s", where "s" is the length of a spacer between motifs, and "m" is a motif ID. Thus, for example, the string "34=3=17=2=5" specifies a two-motif linear model, with motifs 3 and 2 separated by 17 letters and flanked by 34 letters and 5 letters on the left and right. If the MEME file contains motif occurrences on both strands, then the motif IDs in the order string should be preceded by "+" or "-" indicating the strandedness of the motif. If this option is specified then the --type option must either be linear or not set.

--type

linear|complete|star

Value	Name	Description
linear	Linear HMM	A linear motif-based HMM consists of a sequence of motif models, each separated by one or more tied insert states that represent the spacer region between motifs.
complete	Completely Connected HMM	A completely connected model includes transitions from the end of each motif to the beginning of every other motif in the model (with a spacer model along each transition). This more general topology allows for motifs that are repeated, deleted or shuffled.
star	Star HMM	A star connected model has transitions from a single spacer state to every motif and then returning back to the spacer before continuing.

A linear HMM is created.

--pthresh p-value This option sets a p-value threshold for inclusion of motif occurences in the transition probability matrix used to construct the HMM.

--nspacer num Normally the mhmm program models each spacer using a single insert state. The distribution of spacer lengths produced by a single insert stage is exponential in form. A more reasonable distribution would be a bell-shaped curve such as a Gaussian. Modeling the length distribution explicitly is computationally expensive; however, as a Gaussian distribution can only be approximated using multiple insert states to represent a single spacer region. The --nspacer option specifies the number of insert states used to represent each spacer. Note that this Gaussian approximation is only effective in conjunction with total probability training and scoring.

--transpseudo pseudocount Specify the value of the pseudocount used in converting transition counts to transition probabilities.

--spacerpseudo pseudocount Specify the value of the pseudocount used in converting transition counts to spacer self-loop probabilities.

--description text Specify descriptive text to be stored as a comment in the MHMM file.

--fim A free-insertion module (FIM) is an insert state with 1.0 probability of self-transition and 1.0 probability of exit transition. Thus, traversing such a state has zero transition cost. Specifying this option causes all spacers to be represented using FIMs.

--keep-unused By default, mhmm remove from the transition probability matrix all inter-motif transitions that are not observed in the data. This option allows those transitions to be retained. This option is only relevant if the model has a completely connected topology.

--noheader Do not put a header on the output file.

--noparams Do not list the parameters at the end of the output.

--notime Do not print the running time and host name at the end of the output.

--quiet Combine the previous three flags and set verbosity to 1.

The MEME Suite

Motif-based sequence analysis tools

mhmm (unsupported)

Usage:

Description

Input

Motif File

Output

Options