Hi, I hope I'm writing to the right area (if not, please point me in the right direction and I'll go on my way). I have a set of tools that our team have built and been using for functional genomics analysis, which I would like to integrate into Galaxy. My first concern is with the tests. Our code is unit tested but one script is tasked with downloading gene sets from Ensembl. We have a local copy of much of the Ensembl DB (5TB worth) and even this is fairly slow to pull from (5-10 mins). Pulling from the publicly available Ensembl site would be horrifically slow. Since it's stated here: http://wiki.galaxyproject.org/Admin/Tools/Writing%20Tests that functional tests are required, I'm looking for suggestions as to how to meet this requirement without taking up an hour of the test runner's time (for one tool). My second question is if a file is produced by a script and given a specific name based on its inputs, how should I (if at all) define the output of this script. My last question is "how much functional testing is required?". I have a script here whose job is to produce plots of data. There are about 30 command-line options and many of these have multiple choices available to them. Do I need to provide functional tests for every one of these? I can imagine having to provide 100 PNG files to compare my output again.... I have searched for answers to the above (made much more tricky thanks to Samsung), I don't wish to waste anyone's time. Many thanks in advance, Cameron
On Jan 7, 2013, at 10:19 PM, Cameron Jack wrote:
Hi, I hope I'm writing to the right area (if not, please point me in the right direction and I'll go on my way).
I have a set of tools that our team have built and been using for functional genomics analysis, which I would like to integrate into Galaxy.
My first concern is with the tests. Our code is unit tested but one script is tasked with downloading gene sets from Ensembl. We have a local copy of much of the Ensembl DB (5TB worth) and even this is fairly slow to pull from (5-10 mins). Pulling from the publicly available Ensembl site would be horrifically slow. Since it's stated here: http://wiki.galaxyproject.org/Admin/Tools/Writing%20Tests that functional tests are required, I'm looking for suggestions as to how to meet this requirement without taking up an hour of the test runner's time (for one tool).
Hi Cameron, Tests should be self-contained so that they can be run without requiring external network access. You may want to consider providing your Ensembl data as locally cached reference data rather than downloaded by the tool itself. If the tool needs that locally cached data to run tests, what we generally do is take the smallest possible subset of the reference data that will allow you to test proper tool operation and provide this with the tool as test data.
My second question is if a file is produced by a script and given a specific name based on its inputs, how should I (if at all) define the output of this script.
Does the tool work from the UI? For Galaxy to find a tool's output, it has to be written to the output dataset path provided by Galaxy when the job is run. For tools that hardcode output names there are a few options: the 'from_work_dir' attribute on the output dataset tag, or having a wrapper script around the tool that does the appropriate move after the wrapped tool has finished executing.
My last question is "how much functional testing is required?". I have a script here whose job is to produce plots of data. There are about 30 command-line options and many of these have multiple choices available to them. Do I need to provide functional tests for every one of these? I can imagine having to provide 100 PNG files to compare my output again....
Coverage of all of the possible option combinations is not necessary. Most tools have 2-4 tests, some will have more, but usually less than 10. I would suggest testing the option combinations most likely to be used. --nate
I have searched for answers to the above (made much more tricky thanks to Samsung), I don't wish to waste anyone's time.
Many thanks in advance, Cameron
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (2)
-
Cameron Jack
-
Nate Coraor