mhmm [options] <motif file>
This program creates motif-based hidden Markov models (HMMs) of families of related biosequences. The program takes as input a set of DNA or protein motif models and produces as output a single HMM containing the given motifs. The program can produce three types of models: linear models, in which the motifs are arranged like beads on a string, completely connected models, which allow for repetitions of motifs and for motifs to appear in any order, and star models. Mhmm writes its output in a format readable by the other MEME Suite programs, mhmms and mhmmscan.
Three types of models may be produced. A linear motif-based HMM consists of a sequence of motif models, each separated by one or more tied insert states that represent the spacer region between motifs. A completely connected model, on the other hand, includes transitions from the end of each motif to the beginning of every other motif in the model (with a spacer model along each transition). This more general topology allows for motifs that are repeated, deleted or shuffled. A star model is also available. By default, the program produces a linear model.
Transition probabilities among motifs are derived from the motif occurence information in the given MEME file. For the completely connected topology, this information is derived from all of the motif occurences. For the linear topology, the information is derived only from the best-scoring sequence. Alternatively, the order and spacing of motifs within a linear model may be specified via the --order option.
A file containing MEME formatted motifs. Outputs from MEME and DREME are supported along with minimal MEME format for which there are conversion scripts available to support other formats. Input motifs that are likely to appear in the sequences.
Writes a motif-based hidden Markov model in MHMM format to standard output.
Option | Parameter | Description | Default Behaviour | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Input/Output | |||||||||||||||
--motif | motif # | This option (which may be repeated) allows the user to select a specific motif for inclusion in the HMM. The specified motif number corresponds to the motif index in the MEME file. | |||||||||||||
--nmotifs | n | This option is similar to the -motif option, except that it tells mhmm to use the first n motifs in the given MEME file. | All motifs are included. | ||||||||||||
--ethresh | ev | This option sets an E-value threshold for inclusion of motifs in the model. Motifs with E-values < ev are ignored. | All motifs are included. | ||||||||||||
--lowcomp | threshold | Eliminate low-complexity motifs from the model. Motif
complexity is the average K-L distance between the "motif
background distribution" and each column of the motif. The
motif background is just the average distribution of all the
columns. The K-L distance, which measures the difference
between two distributions, p and f, is the same as the
information content:
p1.log(p1/f1) +
p2.log(p2/f2) + ... +
pn.log(pn/fn)
This value increases with increasing complexity.
|
|||||||||||||
--order | string | This option instructs mhmm to build an HMM with linear
topology. The given string specifies the order and spacing of
the motifs within the model, and has the format
"s=m=s=m=…=s=m=s ", where "s" is the length of a spacer
between motifs, and "m" is a motif ID. Thus, for example, the
string "34=3=17=2=5 " specifies a two-motif linear model, with
motifs 3 and 2 separated by 17 letters and flanked by 34 letters
and 5 letters on the left and right. If the MEME file contains
motif occurrences on both strands, then the motif IDs in the
order string should be preceded by "+" or "-" indicating the
strandedness of the motif. If this option is specified then the
--type option must either be
linear or not set. |
|||||||||||||
--type | linear|complete|star |
|
A linear HMM is created. | ||||||||||||
--pthresh | p-value | This option sets a p-value threshold for inclusion of motif occurences in the transition probability matrix used to construct the HMM. | |||||||||||||
--nspacer | num | Normally the mhmm program models each spacer using a single insert state. The distribution of spacer lengths produced by a single insert stage is exponential in form. A more reasonable distribution would be a bell-shaped curve such as a Gaussian. Modeling the length distribution explicitly is computationally expensive; however, as a Gaussian distribution can only be approximated using multiple insert states to represent a single spacer region. The --nspacer option specifies the number of insert states used to represent each spacer. Note that this Gaussian approximation is only effective in conjunction with total probability training and scoring. | |||||||||||||
--transpseudo | pseudocount | Specify the value of the pseudocount used in converting transition counts to transition probabilities. | |||||||||||||
--spacerpseudo | pseudocount | Specify the value of the pseudocount used in converting transition counts to spacer self-loop probabilities. | |||||||||||||
--description | text | Specify descriptive text to be stored as a comment in the MHMM file. | |||||||||||||
--fim | A free-insertion module (FIM) is an insert state with 1.0 probability of self-transition and 1.0 probability of exit transition. Thus, traversing such a state has zero transition cost. Specifying this option causes all spacers to be represented using FIMs. | ||||||||||||||
--keep-unused | By default, mhmm remove from the transition probability matrix all inter-motif transitions that are not observed in the data. This option allows those transitions to be retained. This option is only relevant if the model has a completely connected topology. | ||||||||||||||
--noheader | Do not put a header on the output file. | ||||||||||||||
--noparams | Do not list the parameters at the end of the output. | ||||||||||||||
--notime | Do not print the running time and host name at the end of the output. | ||||||||||||||
--quiet | Combine the previous three flags and set verbosity to 1. |