auto removal of datasets when number of outputs cannot be determined until tool is run
Hi all, So I am using this documentation<http://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_of_Output_datasets_cannot_be_determined_until_tool_run>to create a tool that has a variable number of output files based upon the input file. I have gotten it to work, however, the intermediate files which I create in the $__new_file_path__ directory (which are then copied into the database as history files) are not being deleted upon tool completion. Is there a way to get galaxy to automatically delete these files, or is there some programmatic way to do it? - Nik. -- Nikhil Joshi Bioinformatics Analyst/Programmer UC Davis Bioinformatics Core http://bioinformatics.ucdavis.edu/ najoshi -at- ucdavis -dot- edu 530.752.2698 (w)
Hi Nikhil, Many things other than this variable number of outputs mechanism use the new_file_path and Galaxy doesn't really "manage"/clean anything put in that directory - it should be thought of as temp space (like /tmp) but that should be shared over the cluster (if you have one). For this reason, I guess ideally one would setup a cron job to clean out files older than a few weeks out of this directory - that said we never did this when I was working at MSI where we had a fairly large Galaxy instance. That directory would grow to a few hundred gigabytes - but on a multi-petabyte file system automating the cleanup on that never seemed like a priority. Looking at this comment it universe_wsgi.ini though, I am think it would be better to just create these files in the jobs working directory - which will get cleaned up by Galaxy (unless you set cleanup_job in universe_wsgi.ini to never). # Tools with a number of outputs not known until runtime can write these # outputs to a directory for collection by Galaxy when the job is done. # Previously, this directory was new_file_path, but using one global directory # can cause performance problems, so using job_working_directory ('.' or cwd # when a job is run) is encouraged. By default, both are checked to avoid # breaking existing tools. #collect_outputs_from = new_file_path,job_working_directory I will try to find some time to test this out and update the documentation on the wiki. Thanks for the post, -John On Sat, Nov 16, 2013 at 6:19 AM, Nikhil Joshi <najoshi@ucdavis.edu> wrote:
Hi all,
So I am using this documentation to create a tool that has a variable number of output files based upon the input file. I have gotten it to work, however, the intermediate files which I create in the $__new_file_path__ directory (which are then copied into the database as history files) are not being deleted upon tool completion. Is there a way to get galaxy to automatically delete these files, or is there some programmatic way to do it?
- Nik.
-- Nikhil Joshi Bioinformatics Analyst/Programmer UC Davis Bioinformatics Core http://bioinformatics.ucdavis.edu/ najoshi -at- ucdavis -dot- edu 530.752.2698 (w)
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
John Chilton
-
Nikhil Joshi