fasta-shuffle-letters

Usage:

fasta-shuffle-letters [options] <sequence file> [<output file>]

Description

The program fasta-shuffle-letters creates a shuffled version of a FASTA file. The letters in each sequence in the input file are shuffled in such a way that k-mer frequencies are exactly preserved where k is by default 1 but may be set by the "-kmer" option. If an alphabet is specified via -dna, -rna, -protein or -alph, any aliased symbols are first converted to their core symbol before shuffling. By default, aliased symbols are not converted to their core symbols, and are treated as distinct letters, which may not be what you want.

The underlying implementation uses uShuffle.

Input

Sequences in FASTA format from a file.

Output

Writes a FASTA format file to the optional output file or standard output if it is left unspecified.

Options

Option Parameter Description Default Behaviour
General Options
-kmernum Shuffle the sequences so groups of num symbols appear with exactly the same frequencies. Note that in setting this number you must maintain a balance - the larger the number the more realistic the sequence will be however too large and it will not be shuffled at all! A value of 2 is used by MEME-ChIP. For values larger than 1 specifing the alphabet is highly recommended because it allows for the translation of aliases which may be important in some cases like soft-masked sequence. A value of 1 is used which allows the shuffle to be completely random.
-copiesnum The number of shuffled copies to create for each sequence in the source. A single shuffled sequence is created for each sequence in the source.
-dna  As if the -alph option was specified with the standard DNA alphabet as input.
-rna  As if the -alph option was specified with the standard RNA alphabet as input.
-protein  As if the -alph option was specified with the standard protein alphabet as input.
-alphfile Alias symbols are converted to their core representation in the given alphabet definition prior to shuffling. Any unknown symbols will be converted to the alphabet's wildcard. When no alphabet is specified symbols (including aliases) are treated as distinct symbols while shuffling.
-linenum The sequences will be output with a maximum of num symbols per line. A line length of 100 is used.
-tagtext The name of the sequence will have text appended to it. The name of the sequence will have "_shuf" appended to it.
-seednum Set the seed of the random number generator to num. Seed the random number generator from the computer as randomly as possible.
-help  Display a brief help message and exit.