dear colleagues, at the university of oslo, we develop a galaxy-based portal for natural language processing (LAP: Language Analysis Portal). jobs are submitted to a compute cluster via DRMAA and SLURM. current development is against the galaxy release of march 2015. i am wondering about fine-grained control of job resources. our goal is that most users need not look past the ‘Use default job resource parameters’ toggle in the job configuration dialogue. as i understand it, i think we can populate the ‘nativeSpecification’ parameter in ‘job_conf.xml’ with SLURM-specific command-line options to set defaults, for example the project, maximum run-time, number of cores, memory per core, and such. i assume these defaults will be combined with and overwritten by ‘custom’ job resource parameters, in case any are specified in the job configuration dialogue? i tried to track the flow of information from ‘lib/galaxy/jobs/runners/drmaa.py’ via ‘scripts/drmaa_external_runner.py’ into the drmaa-python egg, but i could not easily work out where the merging of ‘nativeSpecification’ and custom resource parameters happens; presumably at about the same time as an actual job script file is created, for submission to SLURM? could someone point me in the right direction here? —more importantly, maybe: we would like to establish per-tool resource defaults. for example, some of our tools require substantially more memory per core than others. i cannot easily find a way of associating resource default with individual tools. i looked at the tool configuration syntax, ‘job_conf.xml.sample_advanced’, and ‘job_resource_params_conf.xml.sample’, as well as at the following documentation pages: https://wiki.galaxyproject.org/Admin/Config/Jobs https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster i am hoping i am overlooking something :-). is there a way to define job resource defaults on a per-tool basis? with warmest thanks in advance, oe
Hi Stephan, I will just quickly answer to your last question. Because, I’m not sure to understand the first part of your message or take the time to :P
i am hoping i am overlooking something :-). is there a way to define job resource defaults on a per-tool basis?
Perhaps, I didn’t understand your message at all: In your tool wrapper, you can use "\${GALAXY_SLOTS:-8}" to dynamically set the ressource according to the setting in the job_conf.xml. Here by default, the job will take 8 CPU (personally, I find that it’s a trap when the administrator/me miss this default value, I prefer to set the default value to 1) <tool id="my_amazing_wrapper" name="My Amazing" > <command> my_amazing_tool -query "$query" […] -num_threads "\${GALAXY_SLOTS:-8}" […] </command> In your job_conf.xml, you can set per tool a destination. Thus, you can specify the number of CPU/Slot, the memory needed, the queue, the nodes ... <destinations default="sge_default"> <destination id="thread4-men_free10" runner="sge"> <param id="nativeSpecification">-V -w n -q galaxy.q -R y -pe thread 4 -l mem_free=10G </param> </destination> </destinations> <tools> <tool id="my_amazing_wrapper" destination="thread4-men_free10"/> </tools> I hope it will help you Cheers Gildas ----------------------------------------------------------------- Gildas Le Corguillé - Bioinformatician/Bioanalyste Plateform ABiMS (Analyses and Bioinformatics for Marine Science) http://abims.sb-roscoff.fr <http://abims.sb-roscoff.fr/> Member of the Workflow4Metabolomics project http://workflow4metabolomics.org <http://workflow4metabolomics.org/> Station Biologique de Roscoff - UPMC/CNRS - FR2424 Place Georges Teissier 29680 Roscoff FRANCE tel: +33 2 98 29 23 81 ------------------------------------------------------------------
Le 12 mars 2016 à 10:36, Stephan Oepen <oe@ifi.uio.no> a écrit :
dear colleagues,
at the university of oslo, we develop a galaxy-based portal for natural language processing (LAP: Language Analysis Portal). jobs are submitted to a compute cluster via DRMAA and SLURM. current development is against the galaxy release of march 2015.
i am wondering about fine-grained control of job resources. our goal is that most users need not look past the ‘Use default job resource parameters’ toggle in the job configuration dialogue.
as i understand it, i think we can populate the ‘nativeSpecification’ parameter in ‘job_conf.xml’ with SLURM-specific command-line options to set defaults, for example the project, maximum run-time, number of cores, memory per core, and such. i assume these defaults will be combined with and overwritten by ‘custom’ job resource parameters, in case any are specified in the job configuration dialogue?
i tried to track the flow of information from ‘lib/galaxy/jobs/runners/drmaa.py’ via ‘scripts/drmaa_external_runner.py’ into the drmaa-python egg, but i could not easily work out where the merging of ‘nativeSpecification’ and custom resource parameters happens; presumably at about the same time as an actual job script file is created, for submission to SLURM? could someone point me in the right direction here?
—more importantly, maybe: we would like to establish per-tool resource defaults. for example, some of our tools require substantially more memory per core than others. i cannot easily find a way of associating resource default with individual tools. i looked at the tool configuration syntax, ‘job_conf.xml.sample_advanced’, and ‘job_resource_params_conf.xml.sample’, as well as at the following documentation pages:
https://wiki.galaxyproject.org/Admin/Config/Jobs https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster
i am hoping i am overlooking something :-). is there a way to define job resource defaults on a per-tool basis?
with warmest thanks in advance, oe ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
many thanks for taking the time to answer my query, gildas!
In your job_conf.xml, you can set per tool a destination.
i had realized that much (sending some of our tools to SLURM, running others on the local node), but i had failed to realize that one can of course have /multiple/ SLURM destinations, which all send to the same cluster but differ in their default resource parameters. thanks again, oe
hallo again, fellow galaxy users and developers, as an extension to my original query, i am now wondering how the parameters in ‘job_resource_params_conf.xml’ map onto SLURM options? for example, i assume <param ... name="processors" ...> maps onto something like ‘--ntasks’ (or maybe ‘--ntasks-per-node’). are the ‘name’ values in the definition of job resource parameters standard keys defined for DRMAA, and drmaa-python know how to map these into SLURM parameters? or is there an explicit specification of that mapping somewhere? we have succeeded in establishing per-tool defaults by putting these into the ‘nativeSpecifiation’ of multiple variants of the DRMAA destination. but now we would also like to customize the valid range and initial value that is presented to users when they decide to use the ‘custom’ job resource form in the tool configuration dialogue. in other words, we would like to do something like the following in ‘job_resource_params_conf.xml’ <param label="Memory" name="memory1" type="integer" size="2" min="1" max="16" value="1" ... /> <param label="Memory" name="memory4" type="integer" size="2" min="4" max="24" value="4" ... /> <param label="Memory" name="memory6" type="integer" size="2" min="6" max="24" value="6" ... /> and then associate a specific memory parameter with individual tools in ‘job_conf.xml’. but for that to work, i would have to understand the mapping to SLURM options and make it so that ‘memory1’ to ‘memory6’ all map to ‘--mem’ (or maybe ‘--mme-per-cpu’). once i understand things better, i would of course be happy to contribute a summary for the galaxy wiki. for all i can see, current documentation does not cover job configuration and job resources in full detail. with thanks in advance, oe On Sun, Mar 13, 2016 at 2:31 PM, Stephan Oepen <oe@ifi.uio.no> wrote:
many thanks for taking the time to answer my query, gildas!
In your job_conf.xml, you can set per tool a destination.
i had realized that much (sending some of our tools to SLURM, running others on the local node), but i had failed to realize that one can of course have /multiple/ SLURM destinations, which all send to the same cluster but differ in their default resource parameters.
thanks again, oe
This mapping is not automatic - you need to write a small Python method to take these parameters specified by the user and map them to your cluster parameters. These methods are called dynamic job destinations and described on the wiki at: https://wiki.galaxyproject.org/Admin/Config/Jobs#Dynamic_Destination_Mapping If your method takes in a function keyword argument called "resource_params", Galaxy will build a dictionary from the user supplied parameters and send them to your function. So in your case {"memory1": 300} or something like that - and the method should build a destination with a native specification that uses this information. Hope this helps. -John On Fri, Mar 18, 2016 at 5:45 PM, Stephan Oepen <oe@ifi.uio.no> wrote:
hallo again, fellow galaxy users and developers,
as an extension to my original query, i am now wondering how the parameters in ‘job_resource_params_conf.xml’ map onto SLURM options? for example, i assume <param ... name="processors" ...> maps onto something like ‘--ntasks’ (or maybe ‘--ntasks-per-node’). are the ‘name’ values in the definition of job resource parameters standard keys defined for DRMAA, and drmaa-python know how to map these into SLURM parameters? or is there an explicit specification of that mapping somewhere?
we have succeeded in establishing per-tool defaults by putting these into the ‘nativeSpecifiation’ of multiple variants of the DRMAA destination. but now we would also like to customize the valid range and initial value that is presented to users when they decide to use the ‘custom’ job resource form in the tool configuration dialogue. in other words, we would like to do something like the following in ‘job_resource_params_conf.xml’
<param label="Memory" name="memory1" type="integer" size="2" min="1" max="16" value="1" ... /> <param label="Memory" name="memory4" type="integer" size="2" min="4" max="24" value="4" ... /> <param label="Memory" name="memory6" type="integer" size="2" min="6" max="24" value="6" ... />
and then associate a specific memory parameter with individual tools in ‘job_conf.xml’. but for that to work, i would have to understand the mapping to SLURM options and make it so that ‘memory1’ to ‘memory6’ all map to ‘--mem’ (or maybe ‘--mme-per-cpu’).
once i understand things better, i would of course be happy to contribute a summary for the galaxy wiki. for all i can see, current documentation does not cover job configuration and job resources in full detail.
with thanks in advance, oe
On Sun, Mar 13, 2016 at 2:31 PM, Stephan Oepen <oe@ifi.uio.no> wrote:
many thanks for taking the time to answer my query, gildas!
In your job_conf.xml, you can set per tool a destination.
i had realized that much (sending some of our tools to SLURM, running others on the local node), but i had failed to realize that one can of course have /multiple/ SLURM destinations, which all send to the same cluster but differ in their default resource parameters.
thanks again, oe
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (3)
-
Gildas Le Corguillé
-
John Chilton
-
Stephan Oepen