We are working with some very large sequence libraries (paired end 70M reads+ each end - 15Gbx2). We already know what the file types are and that they are appropriate for the pipeline. There seems to be a large amount of the processing effort expended after the completion after each step in the workflow analysing the files and determining their attributes - this is related to the size of the files (which are large at every step) and is of no practical use to us, except maybe on the final step.
Is there any way to suppress post processing steps and simply accept the file as specified in the tool output tags? How can we reduce or eliminate verification/indexing on metadata tags - what implications should we be aware of.
Thanks
dennis
Dennis Gascoigne wrote:
We are working with some very large sequence libraries (paired end 70M reads+ each end - 15Gbx2). We already know what the file types are and that they are appropriate for the pipeline. There seems to be a large amount of the processing effort expended after the completion after each step in the workflow analysing the files and determining their attributes - this is related to the size of the files (which are large at every step) and is of no practical use to us, except maybe on the final step.
Is there any way to suppress post processing steps and simply accept the file as specified in the tool output tags? How can we reduce or eliminate verification/indexing on metadata tags - what implications should we be aware of.
Hi Dennis,
To help us determine how best to address this can you provide (for the datatypes you're using), specifically which metadata is unnecessary?
In the coming week or two we'll be making things like line/sequence count administratively optional, which would probably solve much of this.
--nate
Thanks
dennis
galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
galaxy-dev@lists.galaxyproject.org