Greetings, I'm looking to add a tool that works with a custom datatype that would dynamically generate input parameter options based on the dataset metadata. For example, A dataset of type foo contains metadata as follows: descfields = ['label','description'] quantfields = ['qualityscore','othernumericvalue'] These values are parsed directly out of the dataset and stored into the metadata via the foo datatype class. However the number of values within the list could vary among datasets of type foo. Now I'd like to configure a tool that generates input parameter for each of the descfields values in the list as well as for each of the quantfields values in its list. I understand that this may be outside of the scope of the current tool syntax but if anyone could provide some direction to how tools can be made more 'dynamic' using their metadata it would be greatly appreciated. One idea was to dynamically generate the <tool>.xml and dynamically loading it upon request. But not sure if this would integrate well. Thanks for your feedback! -AR
Hi Aaron, Unfortunately, I don't have a complete answer to your question, but I can provide some suggestions and information that may help. The next 3 paragraphs are an attempt at practical help, and the rest musings and potential theoretical help specifically concerning the last idea in your email. Anyway, the practical... My first thought is to dig into the implementation of the "Get Data -> Upload File" tool (tool_id=upload1), specifically to examine how it handles composite datasets. I think the parameters (like multiple file uploads or setting metadata values) are automatically generated based on the datatype's "MetadataElement"s. In particular, see exactly how the "set_in_upload" argument to MetadataElement works. I haven't had time to dig in detail into how the tool interface is created in that case, so I can't promise that an answer is there, though I think that is very likely. I also think it's likely that if there is an answer, it may be non-trivial and/or messy to generalize it to your case. My second thought is that you will need to add new tool config tags. I've been looking into how to add a couple of my own to allow the interactive behavior of a tool's form to be more dynamic, in a controlled way. So far, I've identified the following areas that I would have to modify to implement new tags: - lib/galaxy/tools/__init__.py : update_state, parse_input_elem, check_and_update_param_values_helper, handle_unvalidated_param_values_helper, build_param_dict - lib/galaxy/tools/actions/__init__.py : DefaultToolAction : execute, wrap_values - lib/galaxy/tools/parameters/__init__.py : visit_input_values - lib/galaxy/tools/parameters/ : Which of the files in this directory you need to modify will depend on your tag To support testing, also (probably more than): - lib/galaxy/tools/test.py : ToolTestBuilder To support workflows also (probably more than): - lib/galaxy/workflow/modules.py Hopefully that information is of some use, at least if you're looking for a place to start. The Rgenetics / Rexpression tools may also be worth examining, as they use metadata a fair bit, though not quite in the way you've described. And, the theoretical... Finally, I'm intrigued by your idea of generating a tool definition file on-the-fly. JIT tools, heh. I suppose one way to accomplish this would be to have a primary tool that uses the conventional mechanisms to take just enough information (like the first datasets of whose metadata your secondary tool would be a function) to bootstrap and generate the secondary tool as a function of the metadata. The primary tool could then trigger Galaxy to load the secondary tool and (optimally) transparently redirect the user's browser to that tool. Obviously, this approach could be iterated if necessary. This is just an idea though. Implementing it would be more difficult than it sounds, because you'd have to find a way to get your generated tool into Galaxy's "toolbox" in the first place. Each invocation of the primary tool would have to produce a secondary tool with a different path and tool_id, in order to avoid race conditions when two users run the primary tool at once. Even if that is solved satisfactorily, there is still a potential race condition and/or scaling issue. The ToolBox is a single entity, global to the Galaxy instance, so there may be a race condition on addition / removal of secondary tools. Perhaps this is taken care of by the ORM or some other part of the existing design (I don't know enough about the ToolBox's implementation here), but even with concurrency-safe ToolBox operations, there may be a scaling issue. After all, it is accessed pretty frequently. Next, there are the related issues of whether and how to ''clean up" these generated tools once they've been run, and how to prevent them from cluttering up the global toolbox namespace for the whole Galaxy instance. Is there any kind of permissions mechanism for tools (like there is for libraries, for instance) that could be used to prevent each user's generated tools from appearing in each other user's "Tools" menu? Perhaps that could be written. At first glance, I imagine it would be best for the generated tools to be "use once and throw away" and private to the user who ran the primary tool or simply not accessible directly to any user except via the primary tool's one-time redirect. Working with autogenerated tools, you'd also have to be very precise and careful about versioning the primary tool and all of its dependencies, whether data, library, or executable. Otherwise, standard debugging as well as (in particular, reported) bug triage will come over time to be somewhere between a huge pain and completely infeasible. In the end, the JIT tool approach is probably going to be a lot more difficult and a lot more work than just augmenting Galaxy internals to provide the features you're looking for. On the other hand, I expect that such a modification of Galaxy's core code would have to be extensive and involve central / foundational code, thereby dramatically raising the likelihood of difficulty integrating Galaxy updates in the future. The JIT approach may be a bit more respectful of the Galaxy core, though by just how much depends on how invasively you may need to modify the ToolBox to support online adding / removing of tools and internal control of user-based tool permissions. Some of this may already be in the works to support ToolShed. Imho, the JIT approach is inherently cooler, even if potentially more challenging to get right. Best of luck, and let us know it goes :) Best, Eric ________________________________ From: galaxy-dev-bounces@lists.bx.psu.edu [galaxy-dev-bounces@lists.bx.psu.edu] on behalf of Rodriguez, Aaron (NIH/NCI) [C] [rodriguezaa@mail.nih.gov] Sent: Thursday, November 17, 2011 10:32 AM To: galaxy-dev@lists.bx.psu.edu Subject: [galaxy-dev] Dynamic tool configuration Greetings, I'm looking to add a tool that works with a custom datatype that would dynamically generate input parameter options based on the dataset metadata. For example, A dataset of type foo contains metadata as follows: descfields = ['label','description'] quantfields = ['qualityscore','othernumericvalue'] These values are parsed directly out of the dataset and stored into the metadata via the foo datatype class. However the number of values within the list could vary among datasets of type foo. Now I'd like to configure a tool that generates input parameter for each of the descfields values in the list as well as for each of the quantfields values in its list. I understand that this may be outside of the scope of the current tool syntax but if anyone could provide some direction to how tools can be made more 'dynamic' using their metadata it would be greatly appreciated. One idea was to dynamically generate the <tool>.xml and dynamically loading it upon request. But not sure if this would integrate well. Thanks for your feedback! -AR
participants (2)
-
Paniagua, Eric
-
Rodriguez, Aaron (NIH/NCI) [C]