fasta-subsample

Usage:

fasta-subsample <sequences> <count> [options]

Description

Create a random subset of the sequences in a FASTA formatted file. The random seed is fixed so the same subset will be ouput in every run of the program unless it is explicitly set.

Input

Sequences

A file of sequences in FASTA format.

Count

The number of sequences to randomly select for inclusion in the output.

Output

Writes a FASTA formatted file to standard out containing the specified subsample of the original file. If -rest file is specifed then any left over sequences are written to file, which is useful for cross-validation.

Options

Option Parameter Description Default Behaviour
General Options
-seedrandom seed Seed the random number generator uses to select the sequences. A seed of 1 is used.
-restfile Name of the file to send the sequences not selected in the output. The unselected sequences are just discarded.
-offoffset The offset within each sequence to print. The sequence is output from its begining
-lenlen The maxiumum length that printed sequences are constrained to. The sequence is output until its end.