Dataset Collections status
Greetings, I started pulling Galaxy code from the dev branch a few months ago to take advantage of the (then just emerging) dataset collections feature. However, it is not clear to me from the latest release notes if the data collections are now fully merged into master, or if I should continue to use the code in the dev branch to take advantage of bleeding edge code. I would like to move back to the master branch as soon as feasible. When running workflows over dataset collections I will frequently see errors like: /bin/sh: 1: /home/galaxy/galaxy_old/database/job_working_directory/001/1216/galaxy_1216.sh: Text file busy Which, from what I can tell, occurs when one process is trying to modify/delete a file open in another process. While the error seems to be repeatable, it also seems random as the errors do not occur in the same places if I run the workflow multiple times. Given that I am working from the dev branch I don't want to open/raise issues on features still in development. But if this is unexpected then I can do some more investigating and file a proper bug report. Cheers, Keith ------------------------------ Research Associate Department of Computer Science Vassar College Poughkeepsie, NY
On Fri, Aug 7, 2015 at 8:01 PM, Keith Suderman <suderman@cs.vassar.edu> wrote:
Greetings,
I started pulling Galaxy code from the dev branch a few months ago to take advantage of the (then just emerging) dataset collections feature. However, it is not clear to me from the latest release notes if the data collections are now fully merged into master, or if I should continue to use the code in the dev branch to take advantage of bleeding edge code. I would like to move back to the master branch as soon as feasible.
It is an ongoing effort - but the master branch as of now contains essentially everything in the dev branch https://github.com/galaxyproject/galaxy/tree/master. I need to put together some release notes for 15.07 before there can be an announcement of that but there is a few new collection related things in the release. In some senses though collections have been fully usable for over a year - and in some senses there is a lot of work left to do. Kind of depends on what you are doing.
When running workflows over dataset collections I will frequently see errors like:
/bin/sh: 1: /home/galaxy/galaxy_old/database/job_working_directory/001/1216/galaxy_1216.sh: Text file busy
I don't think this is related to collections per se - I think it is probably more a file system problem - are you using a local job runner or a cluster manager? Is the file system mounted over a slow NFS connection?
Which, from what I can tell, occurs when one process is trying to modify/delete a file open in another process. While the error seems to be repeatable, it also seems random as the errors do not occur in the same places if I run the workflow multiple times.
Given that I am working from the dev branch I don't want to open/raise issues on features still in development. But if this is unexpected then I can do some more investigating and file a proper bug report.
I would say report bugs in dev always - maybe check the existing ones on github and Trello first - but ideally we would like to catch bugs as early as possible and we don't usually commit half-baked code to dev - it should be bug free (though maybe missing features). Hope this helps, -John
Cheers, Keith
------------------------------ Research Associate Department of Computer Science Vassar College Poughkeepsie, NY
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi John, I will try what we have against master. I just went through my old emails and it looks like the developer branch was recommended in response to a UI issue I experienced with large dataset collections and not the collections themselves. For reference, the UI issue occurred when I inadvertently created 4K history items and jQuery kept timing out trying to update all the checkboxes being created. The "Text file busy" error occurred on my developer machine (OS X 10.9.5, Python 2.7.9) with no job runner, cluster manager, or NFS. I will run more tests and file proper bug reports for both issues if I can still recreate them. Cheers Keith On Aug 19, 2015, at 9:48 AM, John Chilton <jmchilton@gmail.com> wrote:
On Fri, Aug 7, 2015 at 8:01 PM, Keith Suderman <suderman@cs.vassar.edu> wrote:
Greetings,
I started pulling Galaxy code from the dev branch a few months ago to take advantage of the (then just emerging) dataset collections feature. However, it is not clear to me from the latest release notes if the data collections are now fully merged into master, or if I should continue to use the code in the dev branch to take advantage of bleeding edge code. I would like to move back to the master branch as soon as feasible.
It is an ongoing effort - but the master branch as of now contains essentially everything in the dev branch https://github.com/galaxyproject/galaxy/tree/master. I need to put together some release notes for 15.07 before there can be an announcement of that but there is a few new collection related things in the release. In some senses though collections have been fully usable for over a year - and in some senses there is a lot of work left to do. Kind of depends on what you are doing.
When running workflows over dataset collections I will frequently see errors like:
/bin/sh: 1: /home/galaxy/galaxy_old/database/job_working_directory/001/1216/galaxy_1216.sh: Text file busy
I don't think this is related to collections per se - I think it is probably more a file system problem - are you using a local job runner or a cluster manager? Is the file system mounted over a slow NFS connection?
Which, from what I can tell, occurs when one process is trying to modify/delete a file open in another process. While the error seems to be repeatable, it also seems random as the errors do not occur in the same places if I run the workflow multiple times.
Given that I am working from the dev branch I don't want to open/raise issues on features still in development. But if this is unexpected then I can do some more investigating and file a proper bug report.
I would say report bugs in dev always - maybe check the existing ones on github and Trello first - but ideally we would like to catch bugs as early as possible and we don't usually commit half-baked code to dev - it should be bug free (though maybe missing features).
Hope this helps, -John
Cheers, Keith
------------------------------ Research Associate Department of Computer Science Vassar College Poughkeepsie, NY
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (3)
-
John Chilton
-
Keith Suderman
-
Suderman Keith