Issue with saving 'manipulate fastq' in workflow; and request for advice dealing with barcoded 454 data

24 May 2010

      Hi,

I'm a new user, learning how to use Galaxy while I wait for my 454 results.
So I'm not actually playing with any data yet but I'm trying to set up a
draft workflow as practice. Two issues:

Issue 1.

I am having trouble with the 'manipulate fastq' command. Without this, my
workflow saves quickly and seems fine, but when I include even a (seemingly
simple) 'manipulate fastq' step, it tries to save for many minutes,
unsuccessfully, until I get sick of it and close the window.

Issue 2.

Well this isn't really an issue, just a request for advice! My dataset will
be a barcoded amplicon library, containing 8 different gene regions (which I
can recognise from the amplicon-specific primer sequences) amplified in 64
different individuals (which I can recognise by an individual-specific
barcode sequence). I thought I'd set up a workflow with the following steps:
1) convert to FASTQ format. 2) grooming, filtering to remove short reads
etc. 3) 'manipulate FASTQ' to match all sequences containing one of the
eight reverse primer sequences, and reverse-complement them. 4)
FASTQ--tabular format conversion. 5) eight separate 'select' steps to select
sequences with a match to either the forward primer or the
reverse-complemented reverse primer of the desired gene region.

My question is: does this seem sensible? Is there a more efficient way to do
this that I haven't discovered yet? I was thinking I'd then set up another
workflow to label barcoded individuals, for I could use each of the eight
gene 'output files' in turn as input.

Thanks so much for this service! The screencasts are especially great.

Pip Griffin
University of Melbourne, Australia

Pip Griffin

tags

participants (1)