Re: [galaxy-dev] Per-tool configuration

27 Jun 2014

      On Fri, Jun 27, 2014 at 9:30 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
...
On Fri, Jun 27, 2014 at 3:13 PM, John Chilton <jmchilton@gmail.com> wrote:
...
On Fri, Jun 27, 2014 at 5:16 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
...
On Wed, Jun 18, 2014 at 12:14 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
John - that Trello issue you logged, https://trello.com/c/0XQXVhRz
Generic infrastructure to let deployers specify limits for tools based
on input metadata (number of sequences, file size, etc...)
Would it be fair to say this is not likely to be implemented in the near
future? i.e. Should we consider implementing the BLAST query limit
approach as a short term hack?
It would be good functionality - but I don't foresee myself or anyone
on the core team getting to it in the next six months say.
...
I am now angry with myself though because I realized that dynamic job
destinations are a better way to implement this in the meantime (that
environment stuff was very fresh when I responded so I think I just
jumped there). You can build a flexible infrastructure locally that is
largely decoupled from the tools and that may (?) work around the task
splitting problem Peter brought up.
Outline of the idea:
<snip>
Hi John,
So the idea is to define a dynamic job mapper which checks the
query input size, and if too big raises an error, and otherwise
passes the job to the configured job handler (e.g. SGE cluster).
See https://wiki.galaxyproject.org/Admin/Config/Jobs
It sounds like this ought to be possible right now, but you are
suggesting since this seems quite a general use case, the
code to help build a dynamic mapper using things like file
size (in bytes or number of sequences) could be added to
Galaxy?
Yes it is possible right now and everything could just be stuck right
the rule file itself. I was just suggesting that sharing some of the
helpers with the community might ease the process for future
deployers.
...
This approach would need the Galaxy Admin to setup a custom
job mapper for BLAST (which knows to look at the query file),
but it taps into an existing Galaxy framework. By providing a
reference implementation this ought to be fairly easy to setup,
and can be extended to be more clever about the limits.
Yes. As you mention this can be much more expressive than an XML-based
fixed set of limit types. In addition to static sorts of limits - you
could combine inputs like you mentioned, one could allow local users
of the public resource to run as much as they want, allow larger jobs
on the weekend when things are slow, etc.... I recently added a
high-level utility for looking at job metrics in these rules - so you
can say restrict and or expand the limit based on how many jobs the
user has run in the last month or how many core hours they have
consumed, etc....

https://bitbucket.org/galaxy/galaxy-central/commits/9a905e98e1550314cf821a99...
...
e.g. For BLAST, we should consider both the number (and
length) of the queries, plus the size of the database.
Thanks for clarifying and providing some context to my (in retrospect)
seemingly random Python scripts :).
...
Regards,
Peter