Looking for brave testers. New barcode splitter outputing splitted datasets directly to the history.
Hi, I was wondering if I could get someone to test this new barcode splitter I wrote. The main reason for me to duplicate the already great fastx-toolkit based splitter, is so I can use galaxy's multiple output capabilities. You can find this tool in the testtoolshed for now(after some more testing I will moved to the main toolshed): http://testtoolshed.g2.bx.psu.edu/view/cjav/split_by_barcode Hopefully I got Galaxy's tool dependency system right(it works on my box, not that this says much) and installing this tool should be quite easy. I have to say big thanks to Biopython and this[1] anonymous soul for making it quite easy to write the actual code doing the heavy lifting. [1]https://gist.github.com/dgrtwo/3725741 Cheers, Carlos
Cool, I got a tweet about this tool from @GalaxyProject[1]. To further explain what I'm trying to accomplish here, as I realized not everybody might know what using "Multiple Output Files" and specifically "Number of Output datasets cannot be determined until tool run"[2] entails. The current Barcode Splitter available on Galaxy Main and based on FASTX-toolkit by Assaf Gordon, makes all output files accessible through HTML links. This is not very convenient, as if you want to use, and you probably do, these outputs in a downstream analysis inside Galaxy, your only solution right now is to download the linked files in the HTML output and manually re-import then into Galaxy. The tool I wrote includes the option of writing the output files(splitted FASTA or FASTQ files) with a naming convention that can be used with Galaxy's "Multiple Output Files" so all results files are automatically added to your history. I believe you still can't easily use this tool upstream in a workflow. As I far as I can tell tools without a known number of outputs can't be used upstream in workflows. I do think you can accomplish some automation using the API, although I haven't tested this yet. [1]https://twitter.com/galaxyproject/status/377497531745595392 [2]http://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_o... Best, Carlos On Fri, Sep 6, 2013 at 1:58 PM, Carlos Borroto <carlos.borroto@gmail.com> wrote:
Hi,
I was wondering if I could get someone to test this new barcode splitter I wrote. The main reason for me to duplicate the already great fastx-toolkit based splitter, is so I can use galaxy's multiple output capabilities.
You can find this tool in the testtoolshed for now(after some more testing I will moved to the main toolshed): http://testtoolshed.g2.bx.psu.edu/view/cjav/split_by_barcode
Hopefully I got Galaxy's tool dependency system right(it works on my box, not that this says much) and installing this tool should be quite easy.
I have to say big thanks to Biopython and this[1] anonymous soul for making it quite easy to write the actual code doing the heavy lifting.
[1]https://gist.github.com/dgrtwo/3725741
Cheers, Carlos
Hi Carlos, Nice this is being worked on. Just wanted to mention a quick tip for loading barcode splitter output: 1- right click on the output name & copy the link 2- open 'Get Data -> Upload File' tool, and paste link into box. Execute. 3- Repeat for all. Data will load as dataset, no downloading/uploading required. I find that pasting all the links into a temp text file, then copy/paste all at once into the Upload tool speeds this up - as it then loads all in one go. Not perfect, but maybe helpful for now. Best, Jen Galaxy team On Sep 10, 2013, at 11:49 AM, Carlos Borroto <carlos.borroto@gmail.com> wrote:
Cool, I got a tweet about this tool from @GalaxyProject[1].
To further explain what I'm trying to accomplish here, as I realized not everybody might know what using "Multiple Output Files" and specifically "Number of Output datasets cannot be determined until tool run"[2] entails.
The current Barcode Splitter available on Galaxy Main and based on FASTX-toolkit by Assaf Gordon, makes all output files accessible through HTML links. This is not very convenient, as if you want to use, and you probably do, these outputs in a downstream analysis inside Galaxy, your only solution right now is to download the linked files in the HTML output and manually re-import then into Galaxy. The tool I wrote includes the option of writing the output files(splitted FASTA or FASTQ files) with a naming convention that can be used with Galaxy's "Multiple Output Files" so all results files are automatically added to your history. I believe you still can't easily use this tool upstream in a workflow. As I far as I can tell tools without a known number of outputs can't be used upstream in workflows. I do think you can accomplish some automation using the API, although I haven't tested this yet.
[1]https://twitter.com/galaxyproject/status/377497531745595392 [2]http://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_o...
Best, Carlos
On Fri, Sep 6, 2013 at 1:58 PM, Carlos Borroto <carlos.borroto@gmail.com> wrote:
Hi,
I was wondering if I could get someone to test this new barcode splitter I wrote. The main reason for me to duplicate the already great fastx-toolkit based splitter, is so I can use galaxy's multiple output capabilities.
You can find this tool in the testtoolshed for now(after some more testing I will moved to the main toolshed): http://testtoolshed.g2.bx.psu.edu/view/cjav/split_by_barcode
Hopefully I got Galaxy's tool dependency system right(it works on my box, not that this says much) and installing this tool should be quite easy.
I have to say big thanks to Biopython and this[1] anonymous soul for making it quite easy to write the actual code doing the heavy lifting.
[1]https://gist.github.com/dgrtwo/3725741
Cheers, Carlos
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Jennifer, That's true. Completely forgot about this option which it is more convenient than downloading and re-importing. I still think I need the Multiple Output Files route cause I really want to automate the full processing of the data using the API. Thanks for pointing it out, Carlos On Tue, Sep 10, 2013 at 3:59 PM, Jennifer Jackson <jen@bx.psu.edu> wrote:
Hi Carlos,
Nice this is being worked on.
Just wanted to mention a quick tip for loading barcode splitter output: 1- right click on the output name & copy the link 2- open 'Get Data -> Upload File' tool, and paste link into box. Execute. 3- Repeat for all.
Data will load as dataset, no downloading/uploading required. I find that pasting all the links into a temp text file, then copy/paste all at once into the Upload tool speeds this up - as it then loads all in one go.
Not perfect, but maybe helpful for now.
Best, Jen Galaxy team
On Sep 10, 2013, at 11:49 AM, Carlos Borroto <carlos.borroto@gmail.com> wrote:
Cool, I got a tweet about this tool from @GalaxyProject[1].
To further explain what I'm trying to accomplish here, as I realized not everybody might know what using "Multiple Output Files" and specifically "Number of Output datasets cannot be determined until tool run"[2] entails.
The current Barcode Splitter available on Galaxy Main and based on FASTX-toolkit by Assaf Gordon, makes all output files accessible through HTML links. This is not very convenient, as if you want to use, and you probably do, these outputs in a downstream analysis inside Galaxy, your only solution right now is to download the linked files in the HTML output and manually re-import then into Galaxy. The tool I wrote includes the option of writing the output files(splitted FASTA or FASTQ files) with a naming convention that can be used with Galaxy's "Multiple Output Files" so all results files are automatically added to your history. I believe you still can't easily use this tool upstream in a workflow. As I far as I can tell tools without a known number of outputs can't be used upstream in workflows. I do think you can accomplish some automation using the API, although I haven't tested this yet.
[1]https://twitter.com/galaxyproject/status/377497531745595392 [2]http://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_o...
Best, Carlos
On Fri, Sep 6, 2013 at 1:58 PM, Carlos Borroto <carlos.borroto@gmail.com> wrote:
Hi,
I was wondering if I could get someone to test this new barcode splitter I wrote. The main reason for me to duplicate the already great fastx-toolkit based splitter, is so I can use galaxy's multiple output capabilities.
You can find this tool in the testtoolshed for now(after some more testing I will moved to the main toolshed): http://testtoolshed.g2.bx.psu.edu/view/cjav/split_by_barcode
Hopefully I got Galaxy's tool dependency system right(it works on my box, not that this says much) and installing this tool should be quite easy.
I have to say big thanks to Biopython and this[1] anonymous soul for making it quite easy to write the actual code doing the heavy lifting.
[1]https://gist.github.com/dgrtwo/3725741
Cheers, Carlos
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
Carlos Borroto
-
Jennifer Jackson