gendb

Usage:

gendb [options] <sequence count>

Description

gendb generates the specified number of sequences using a Markov model. The sequence lengths are selected uniformly at random within the range specified by --minseq and --maxseq.

Input

No inputs are required.

Output

Writes the sequences in FASTA format to standard output.

Options

Option Parameter Description Default Behaviour
General Options
--alphfile Generate random sequences using the alphabet defined in file file, an alphabet definition file. Note that this overrides the --type option. Protein sequences are generated unless overridden using the --type option.
--ambigambig fraction Sets the fraction of symbols that will be ambiguous (overrides --type) The default depends on the --type option.
--bfilefile Sets the background model used to generate the sequences from a file in background model format. For the standard DNA and Protein alphabets a built-in 0-order background is used. If a non-standard alphabet is provided without a background then a uniform frequency distribution is used.
--ordern Load the background model up to order n. Load the background model completely.
--type0|1|2|3|4 Allowed types are:
  • 0 = Protein with 1% ambiguous symbols (default)
  • 1 = DNA with 1% ambiguous symbols
  • 2 = codons (ignores -bfile)
  • 3 = DNA without ambiguous symbols
  • 4 = Protein without ambigouous symbols
If an alphabet is not specified with the --alph option then protein sequences are generated.
--minseqmin Minimum sequence length. The minimum sequence length is 50.
--maxseqmax Maximum sequence length. The maximum sequence length is 2,000.
--dummy  Print a "dummy" sequence record before the generated sequences. The "dummy" sequence record is a a FASTA header line listing the gendb parameters but not followed by any sequence lines.
--seedseed Seed for random number generator.