Hello Jen,
Apologies for the delay in responding, I got side-tracked with other projects. Now back to
galaxy!
Is metadata being set externally?
I have it commented out in universe_wsgi.ini:
#set_metadata_externally = False
1 - If you could load some sample input files into a history on
Galaxy
main and share the link, that would be helpful. Just a sample of
sequences that are representative of the entire dataset.
http://main.g2.bx.psu.edu/u/iphan/h/iphan-test
This contains the first 100 fastq sequences of the raw file (before grooming).
2 - Note the specific filter options used in the fastq_filter
tool.
We can scale the data up, run with your filters, and try to see what is
causing the problem.
Analysis steps:
- groom with fastq_groomer (default options)
- fastq_filter: trim 3 bases from 5'end, set minimum quality to 30
1M sequences run fine, 15M sequences crash the fastq_filter as described below.
The FASTX tools work, but requires to run 2 different tools (i.e. produces 1 more
intermediary file), which is not scaling up with the number and size of files we are
dealing with. I'd rather use fastq_filter, if possible.
We are using a local install of Galaxy for exploratory analysis of NGS data and so far are
very happy with it, kudos to your team.
Isabelle
--
Isabelle Phan, DPhil
Seattle Biomedical Research Institute
+1(206)256 7113
> -----Original Message-----
> From: Jennifer Jackson [mailto:jen@bx.psu.edu]
> Sent: Friday, October 01, 2010 12:00 PM
> To: Isabelle Phan
> Cc: 'galaxy-dev(a)lists.bx.psu.edu'
> Subject: Re: [galaxy-dev] Fastq_filter fails on large files
>
> Hi Isabelle,
>
> There are no known limits, but perhaps you have found something new. We
> can explore two areas:
>
> Your Galaxy instance config:
> Is metadata being set externally? Specifically, we are wondering whether
> you have have optional metadata configured to not count fastq blocks if
> the file is larger than a specified size or similar.
>
> Example data & filter options:
1 - If you could load some sample input files into a history on
Galaxy
main and share the link, that would be helpful. Just a sample of
sequences that are representative of the entire dataset.
2 - Note the specific filter options used in the fastq_filter
tool.
We can scale the data up, run with your filters, and try to see what is
causing the problem.
>
> We look forward to your reply,
>
> Jen
> Galaxy team
>
>
> On 9/13/10 3:31 PM, Isabelle Phan wrote:
> > Hello,
> >
> > The tool fastq_filter worked on 1M reads, but fails (hangs) on 15M reads. I had
to
> kill the job after the user let it run for a whole day. The debug.txt file containing
a
> python function "fastq_read_pass_filter" is created in the
files/000/dataset_xxx_files
> directory. I am getting no error from the galaxy server.
> >
> > I wonder what could cause fastq_filter to fail? The fastx equivalent tool works,
but it
> misses all the options of fastq_filter.
> >
> > I'd be grateful for any hints to help me get fastq_filter to work on large
fastq files.
> >
> > Thanks
> >
> > Isabelle
> >
> >
> >
> > _______________________________________________
> > galaxy-dev mailing list
> > galaxy-dev(a)lists.bx.psu.edu
> >
http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> --
> Jennifer Jackson
>
http://usegalaxy.org