fasta-shuffle-letters [options] <sequence file> [<output file>]
The program fasta-shuffle-letters creates a shuffled version of a FASTA file. The letters in each sequence in the input file are shuffled in such a way that k-mer frequencies are exactly preserved where k is by default 1 but may be set by the "-kmer" option. If an alphabet is specified via -dna, -rna, -protein or -alph, any aliased symbols are first converted to their core symbol before shuffling. By default, aliased symbols are not converted to their core symbols, and are treated as distinct letters, which may not be what you want.
The underlying implementation uses uShuffle.
Sequences in FASTA format from a file.
Writes a FASTA format file to the optional output file or standard output if it is left unspecified.
Option | Parameter | Description | Default Behaviour |
---|---|---|---|
General Options | |||
-kmer | num | Shuffle the sequences so groups of num symbols appear with exactly the same frequencies. Note that in setting this number you must maintain a balance - the larger the number the more realistic the sequence will be however too large and it will not be shuffled at all! A value of 2 is used by MEME-ChIP. For values larger than 1 specifing the alphabet is highly recommended because it allows for the translation of aliases which may be important in some cases like soft-masked sequence. | A value of 1 is used which allows the shuffle to be completely random. |
-copies | num | The number of shuffled copies to create for each sequence in the source. | A single shuffled sequence is created for each sequence in the source. |
-dna | As if the -alph option was specified with the standard DNA alphabet as input. | ||
-rna | As if the -alph option was specified with the standard RNA alphabet as input. | ||
-protein | As if the -alph option was specified with the standard protein alphabet as input. | ||
-alph | file | Alias symbols are converted to their core representation in the given alphabet definition prior to shuffling. Any unknown symbols will be converted to the alphabet's wildcard. | When no alphabet is specified symbols (including aliases) are treated as distinct symbols while shuffling. |
-line | num | The sequences will be output with a maximum of num symbols per line. | A line length of 100 is used. |
-tag | text | The name of the sequence will have text appended to it. | The name of the sequence will have "_shuf" appended to it. |
-seed | num | Set the seed of the random number generator to num. | Seed the random number generator from the computer as randomly as possible. |
-help | Display a brief help message and exit. |