I'd like to get a better understanding of the point of the database/build attribute, and pose the question of when is the appropriate time to have it set? In our case at the Jackson Laboratory, the most common build is NCBI37/MM9. However, the feeling of many folks here, is that this should not be set on our fastq files. The only place we really run into trouble is with cufflinks. If you haven't set the db when you get to cufflinks you'll get an error. Our suggestion is that there should be one of two options: 1) Tophat has the ability to set the database of the output files based on the genome that was selected for alignment. 2) There should be a module that can be plugged into a workflow that would set the database of the file prior to passing the file to cufflinks (or any other two that requires the database attribute to be set). We are curious if anyone else is running into this issue, and how it is being solved. We're thinking about hacking the Tophat wrapper, but I wanted to check with others before I did this. Thanks, Dave
I'll just chime in quickly with an agreement that FASTQ files should not have dbkeys set. They don't yet belong to a build/reference genome version. Some tools/workflows may currently require a FASTQ file to have the dbkey set, but this should be considered a work-around for a defect in a tool xml. I'll let someone else on the team address Tophat and the suggestions specifically. Thanks for using Galaxy, Dan On May 17, 2011, at 10:57 AM, Dave Walton wrote:
I'd like to get a better understanding of the point of the database/build attribute, and pose the question of when is the appropriate time to have it set?
In our case at the Jackson Laboratory, the most common build is NCBI37/MM9.
However, the feeling of many folks here, is that this should not be set on our fastq files. The only place we really run into trouble is with cufflinks. If you haven't set the db when you get to cufflinks you'll get an error.
Our suggestion is that there should be one of two options:
1) Tophat has the ability to set the database of the output files based on the genome that was selected for alignment.
2) There should be a module that can be plugged into a workflow that would set the database of the file prior to passing the file to cufflinks (or any other two that requires the database attribute to be set).
We are curious if anyone else is running into this issue, and how it is being solved.
We're thinking about hacking the Tophat wrapper, but I wanted to check with others before I did this.
Thanks,
Dave
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Dave, The fact that the Tophat wrapper was not setting the genome based on the alignment genome was actually a bug, but I just fixed it in changeset 5570:0c1251f25c6b. Let us know if you have further questions. Regards, Kelly On May 17, 2011, at 10:57 AM, Dave Walton wrote:
I'd like to get a better understanding of the point of the database/ build attribute, and pose the question of when is the appropriate time to have it set?
In our case at the Jackson Laboratory, the most common build is NCBI37/MM9.
However, the feeling of many folks here, is that this should not be set on our fastq files. The only place we really run into trouble is with cufflinks. If you haven't set the db when you get to cufflinks you'll get an error.
Our suggestion is that there should be one of two options:
1) Tophat has the ability to set the database of the output files based on the genome that was selected for alignment.
2) There should be a module that can be plugged into a workflow that would set the database of the file prior to passing the file to cufflinks (or any other two that requires the database attribute to be set).
We are curious if anyone else is running into this issue, and how it is being solved.
We're thinking about hacking the Tophat wrapper, but I wanted to check with others before I did this.
Thanks,
Dave
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (3)
-
Daniel Blankenberg
-
Dave Walton
-
Kelly Vincent