On Tue, Nov 8, 2011 at 2:07 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

On Tue, Nov 8, 2011 at 9:57 PM, Austin Paul <austinpa@usc.edu> wrote:
> Hi,
>
> I am curious if anyone knows how to select random lines from a fastq file.
> There is a select random lines tool in text manipulation tools, but it does
> not treat fastq files specifically, so it will not group quality lines with
> sequence lines. And if I turn the fastq file to tabular form in order to
> select lines, I can no longer return it to fastq form. Anyone know a way to
> do this in galaxy? Otherwise, perhaps another program? Thanks.
>
> Austin

How big are your FASTQ files (can they be indexed in memory)?

And are you willing to program? If you like Python, Biopython's
Bio.SeqIO.index(...) or Bio.SeqIO.index_db(...) functions would
let you do this easily. Have a look at the "Getting the raw data
for a record" example in the tutorial, and please ask if you liked
a little more help:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Regards,

Peter