[galaxyproject/galaxy] aba31b: [17.01] Restrict workflow scheduling within a hist...
Branch: refs/heads/master Home: https://github.com/galaxyproject/galaxy Commit: aba31bc0d484f3729b46a10b2b357b92fa826fae https://github.com/galaxyproject/galaxy/commit/aba31bc0d484f3729b46a10b2b357... Author: John Chilton <jmchilton@gmail.com> Date: 2017-03-29 (Wed, 29 Mar 2017) Changed paths: M config/galaxy.ini.sample M lib/galaxy/config.py M lib/galaxy/jobs/__init__.py M lib/galaxy/workflow/scheduling_manager.py A test/integration/test_workflow_handler_configuration.py A test/integration/workflow_handler_configuration_job_conf.xml Log Message: ----------- [17.01] Restrict workflow scheduling within a history to a fixed, random handler. Lets revisit the problem that background scheduling workflows (as is the default UI behavior as of 16.10) makes it easier for histories to contain datasets interleaved from different workflow invocations under certain reasonable conditions (https://github.com/galaxyproject/galaxy/issues/3474). Considering only a four year old workflow and tool feature set (no collection operations, no dynamic dataset discovery, only tool and input workflow modules), all workflows can and will fully schedule on the first scheduling iteration. Under those circumstances, this solution is functionally equivalent to history_local_serial_workflow_scheduling introduced #3520 - but should be more performant because all such workflows fully schedule in the first iteration and the double loop introduced here https://github.com/galaxyproject/galaxy/pull/3520/files#diff-d7e80a366f39657... is avoided for each workflow invocation for each iteration. This addresses both concerns I outlined [here](https://github.com/galaxyproject/galaxy/issues/3816#issuecomment-289323288). For workflows that use certain classes of newer tools or newer workflow features - I'd argue this approach will not degrade as harshly as enabling history_local_serial_workflow_scheduling. For instance, imagine a workflow with a dynamic dataset collection output step (such as used by IUC tools Deseq2, Trinity, Stacks, and various Mothur tools) half way through that takes 24 hour of queue time to reach. Now imagine a user running 5 such workflows at once. - Without this and without history_local_serial_workflow_scheduling, the 5 workflows will each run as fast as possible and the UI will show as much of each workflow as can be scheduled but the order of the datsets may be shuffled. The workflows will be complete for the users in 48 hours. - With history_local_serial_workflow_scheduling enabled, only 1 workflow will be scheduled only half way for the first 24 hours and the user will be given no visual indication for why the other workflows are not running for 1 day. The final workflow output will take nearly a week to be complete for the users. - With this enabled - the new default in this commit - each workflow will be scheduled in two chunks but these chunks will be contingious and it should be fairly clear to the user what tool caused the discontinuity of the datasets in the history. So things are still mostly ordered, but the draw backs of history_local_serial_workflow_scheduling are avoided entirely. Namely, the other four workflows aren't hidden from the user without a UI indication and the workflows will still only take 48 hours to be complete and outputs ready for the user. The only drawback of this new default behavior is that you could potentially see some performance improvements by scheduling multiple workflow invocations within one history - but this was never a design goal in my mind when implementing background scheduling and under typical Galaxy use cases I don't think this would be worth the UI problems. So, the older behavior can be re-enabled by setting parallelize_workflow_scheduling_within_histories to True in galaxy.ini but it won't be on by default or really recommended if the Galaxy UI is being used. Commit: 07887be3a162678859e32c063dae024f0a8da25c https://github.com/galaxyproject/galaxy/commit/07887be3a162678859e32c063dae0... Author: Dannon Baker <dannon.baker@gmail.com> Date: 2017-03-31 (Fri, 31 Mar 2017) Changed paths: M config/galaxy.ini.sample M lib/galaxy/config.py M lib/galaxy/jobs/__init__.py M lib/galaxy/workflow/scheduling_manager.py A test/integration/test_workflow_handler_configuration.py A test/integration/workflow_handler_configuration_job_conf.xml Log Message: ----------- Merge pull request #3820 from jmchilton/fixed_history_handler [17.01] Restrict workflow scheduling within a history to a fixed, random handler. Commit: dd439530509f12ede24f8e9763f2bdb5314e2c13 https://github.com/galaxyproject/galaxy/commit/dd439530509f12ede24f8e9763f2b... Author: Dannon Baker <dannon.baker@gmail.com> Date: 2017-03-31 (Fri, 31 Mar 2017) Changed paths: M config/galaxy.ini.sample M lib/galaxy/config.py M lib/galaxy/jobs/__init__.py M lib/galaxy/workflow/scheduling_manager.py A test/integration/test_workflow_handler_configuration.py A test/integration/workflow_handler_configuration_job_conf.xml Log Message: ----------- Merge remote-tracking branch 'upstream/release_17.01' Compare: https://github.com/galaxyproject/galaxy/compare/52b5e62de498...dd439530509f
participants (1)
-
GitHub