Andrew,
Thanks for investigating this. I changed the subject and sent to the galaxy dev list.
I've had a number of tools quit working recently. Particularly tools that inspect the extra_files_path when setting metadata, Defuse, Rsem, SnpEff.
I think there was a change in the galaxy framework: The extra_files_path when referenced from an input or output in the cheetah template sections of the tool config xml will be relative to the job working directly rather than the files location.
I've just changed a few of my tools on my server yesterday from: <param_name>.extra_files_path to: <param_name>.dataset.extra_files_path and they now work again.
Dan or John, is that the right way to handle this?
Thanks,
JJ
On 10/13/14, 9:29 PM, Andrew Lonie wrote:
Hi Jim. I am probably going about this the wrong way, but I am not clear on how to report tool errors (if in fact this is a tool error!)
I've been trialling your snpeff wrapper from the test toolshed and getting a consistent error with the SnpEff Download and SnpEff sub tools (the SnpSift dbNSFP works fine). The problem seems to be with an attribute declaration and manifests during database download as:
Traceback (most recent call last): File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", line 564, in finish_job job_state.job_wrapper.finish( stdout, stderr, exit_code ) File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line 1107, in finish dataset.datatype.set_meta( dataset, overwrite=False ) # call datatype.set_meta directly for the initial set_meta call during dataset creation File "/mnt/galaxy/shed_tools/testtoolshed.g2.bx.psu.edu/repos/iuc/snpeff/1938721334b3/snpeff/lib/galaxy/datatypes/snpeff.py", line 21, in set_meta data_dir = dataset.files_path AttributeError: 'HistoryDatasetAssociation' object has no attribute 'files_path'
We fiddled around with the wrapper, eventually replacing 'dataset.files_path' with 'dataset.extra_files_path' in snpeff.py, which fixed the download bug, but then SnpEff subtool itself threw a similar error when I tried to use that database from the history.
I chased up a bit more but cannot understand the various posts on files_path vs extra_files_path
I've shared a history with both of these errors here: http://130.56.251.62/galaxy/u/alonie/h/unnamed-history
Maybe this is a problem with our Galaxy image?
Any help appreciated!
Andrew
A/Prof Andrew Lonie University of Melbourne
JJ,
Arg this is a mess. I am very sorry about this - I still don't understand extra_files_path versus files_path myself. There are open questions on Peter's blast repo and no one ever followed up on my object store questions about this with Bjoern's issues a couple release cycles ago. We need to get these to work - write documetation explicitly declaring best practices we can all agree on and then write some tests to ensure things don't break in the future.
When you say your tools broke recently - can you say for certain which release broke these - the August14, October14, something older?
I'll try to do some more research and get back to you.
-John
On Tue, Oct 14, 2014 at 6:04 AM, Jim Johnson johns198@umn.edu wrote:
Andrew,
Thanks for investigating this. I changed the subject and sent to the galaxy dev list.
I've had a number of tools quit working recently. Particularly tools that inspect the extra_files_path when setting metadata, Defuse, Rsem, SnpEff.
I think there was a change in the galaxy framework: The extra_files_path when referenced from an input or output in the cheetah template sections of the tool config xml will be relative to the job working directly rather than the files location. I've just changed a few of my tools on my server yesterday from: <param_name>.extra_files_path to: <param_name>.dataset.extra_files_path and they now work again.
Dan or John, is that the right way to handle this? Thanks,
JJ
On 10/13/14, 9:29 PM, Andrew Lonie wrote:
Hi Jim. I am probably going about this the wrong way, but I am not clear on how to report tool errors (if in fact this is a tool error!)
I've been trialling your snpeff wrapper from the test toolshed and getting a consistent error with the SnpEff Download and SnpEff sub tools (the SnpSift dbNSFP works fine). The problem seems to be with an attribute declaration and manifests during database download as:
Traceback (most recent call last): File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", line 564, in finish_job job_state.job_wrapper.finish( stdout, stderr, exit_code ) File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line 1107, in finish dataset.datatype.set_meta( dataset, overwrite=False ) # call datatype.set_meta directly for the initial set_meta call during dataset creation File "/mnt/galaxy/shed_tools/testtoolshed.g2.bx.psu.edu/repos/iuc/snpeff/1938721334b3/snpeff/lib/galaxy/datatypes/snpeff.py", line 21, in set_meta data_dir = dataset.files_path AttributeError: 'HistoryDatasetAssociation' object has no attribute 'files_path'
We fiddled around with the wrapper, eventually replacing 'dataset.files_path' with 'dataset.extra_files_path' in snpeff.py, which fixed the download bug, but then SnpEff subtool itself threw a similar error when I tried to use that database from the history.
I chased up a bit more but cannot understand the various posts on files_path vs extra_files_path
I've shared a history with both of these errors here: http://130.56.251.62/galaxy/u/alonie/h/unnamed-history
Maybe this is a problem with our Galaxy image?
Any help appreciated!
Andrew
A/Prof Andrew Lonie University of Melbourne
-- James E. Johnson Minnesota Supercomputing Institute University of Minnesota ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Okay - so this is what broke things:
https://bitbucket.org/galaxy/galaxy-central/commits/d781366bc120787e201b73a4...
My feeling with the commit was that wrappers and tools should never be explicitly accessing paths explicitly through input.dataset.*. I think this would circumvent options like outputs_to_working_directory and break remote job execution through Pulsar. It also breaks the object store abstraction I think - which is why I made the change for Bjoern I guess.
I did not (and this was stupid on my part) realize that datatype code would be running on the remote host and accessing these model properties directly outside the abstractions setup by the wrappers supplied to cheetah code and so they have become out of sync as of that commit.
I am thinking somehow changing what the datatype code gets is the right approach and not fixing things by circumvent the wrapper and accessing properties directly on the dataset. Since you will find that doing this breaks things for Bjoern object store and could probably never run on usegalaxy.org say for the same reason.
Too many different competing deployment options all being incompatible with each other :(.
Will keep thinking about this and respond again.
-John
On Wed, Oct 15, 2014 at 9:39 AM, John Chilton jmchilton@gmail.com wrote:
JJ,
Arg this is a mess. I am very sorry about this - I still don't understand extra_files_path versus files_path myself. There are open questions on Peter's blast repo and no one ever followed up on my object store questions about this with Bjoern's issues a couple release cycles ago. We need to get these to work - write documetation explicitly declaring best practices we can all agree on and then write some tests to ensure things don't break in the future.
When you say your tools broke recently - can you say for certain which release broke these - the August14, October14, something older?
I'll try to do some more research and get back to you.
-John
On Tue, Oct 14, 2014 at 6:04 AM, Jim Johnson johns198@umn.edu wrote:
Andrew,
Thanks for investigating this. I changed the subject and sent to the galaxy dev list.
I've had a number of tools quit working recently. Particularly tools that inspect the extra_files_path when setting metadata, Defuse, Rsem, SnpEff.
I think there was a change in the galaxy framework: The extra_files_path when referenced from an input or output in the cheetah template sections of the tool config xml will be relative to the job working directly rather than the files location. I've just changed a few of my tools on my server yesterday from: <param_name>.extra_files_path to: <param_name>.dataset.extra_files_path and they now work again.
Dan or John, is that the right way to handle this? Thanks,
JJ
On 10/13/14, 9:29 PM, Andrew Lonie wrote:
Hi Jim. I am probably going about this the wrong way, but I am not clear on how to report tool errors (if in fact this is a tool error!)
I've been trialling your snpeff wrapper from the test toolshed and getting a consistent error with the SnpEff Download and SnpEff sub tools (the SnpSift dbNSFP works fine). The problem seems to be with an attribute declaration and manifests during database download as:
Traceback (most recent call last): File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", line 564, in finish_job job_state.job_wrapper.finish( stdout, stderr, exit_code ) File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line 1107, in finish dataset.datatype.set_meta( dataset, overwrite=False ) # call datatype.set_meta directly for the initial set_meta call during dataset creation File "/mnt/galaxy/shed_tools/testtoolshed.g2.bx.psu.edu/repos/iuc/snpeff/1938721334b3/snpeff/lib/galaxy/datatypes/snpeff.py", line 21, in set_meta data_dir = dataset.files_path AttributeError: 'HistoryDatasetAssociation' object has no attribute 'files_path'
We fiddled around with the wrapper, eventually replacing 'dataset.files_path' with 'dataset.extra_files_path' in snpeff.py, which fixed the download bug, but then SnpEff subtool itself threw a similar error when I tried to use that database from the history.
I chased up a bit more but cannot understand the various posts on files_path vs extra_files_path
I've shared a history with both of these errors here: http://130.56.251.62/galaxy/u/alonie/h/unnamed-history
Maybe this is a problem with our Galaxy image?
Any help appreciated!
Andrew
A/Prof Andrew Lonie University of Melbourne
-- James E. Johnson Minnesota Supercomputing Institute University of Minnesota ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
I agree with you about the inadvisable use of: input.dataset.*.
I'm looking at:
lib/galaxy/model/__init__.py class Dataset( object ): ... def __init__( self, id=None, state=None, external_filename=None, extra_files_path=None, file_size=None, purgable=True, uuid=None ): ... self._extra_files_path = extra_files_path ... @property def extra_files_path( self ): return self.object_store.get_filename( self, dir_only=True, extra_dir=self._extra_files_path or "dataset_%d_files" % self.id )
I'm trying to see when self._extra_files_path gets set. Otherwise, would this return the path relative to the current file location of dataset?
On 10/15/14, 9:36 AM, John Chilton wrote:
Okay - so this is what broke things:
https://bitbucket.org/galaxy/galaxy-central/commits/d781366bc120787e201b73a4...
My feeling with the commit was that wrappers and tools should never be explicitly accessing paths explicitly through input.dataset.*. I think this would circumvent options like outputs_to_working_directory and break remote job execution through Pulsar. It also breaks the object store abstraction I think - which is why I made the change for Bjoern I guess.
I did not (and this was stupid on my part) realize that datatype code would be running on the remote host and accessing these model properties directly outside the abstractions setup by the wrappers supplied to cheetah code and so they have become out of sync as of that commit.
I am thinking somehow changing what the datatype code gets is the right approach and not fixing things by circumvent the wrapper and accessing properties directly on the dataset. Since you will find that doing this breaks things for Bjoern object store and could probably never run on usegalaxy.org say for the same reason.
Too many different competing deployment options all being incompatible with each other :(.
Will keep thinking about this and respond again.
-John
On Wed, Oct 15, 2014 at 9:39 AM, John Chilton jmchilton@gmail.com wrote:
JJ,
Arg this is a mess. I am very sorry about this - I still don't understand extra_files_path versus files_path myself. There are open questions on Peter's blast repo and no one ever followed up on my object store questions about this with Bjoern's issues a couple release cycles ago. We need to get these to work - write documetation explicitly declaring best practices we can all agree on and then write some tests to ensure things don't break in the future.
When you say your tools broke recently - can you say for certain which release broke these - the August14, October14, something older?
I'll try to do some more research and get back to you.
-John
On Tue, Oct 14, 2014 at 6:04 AM, Jim Johnson johns198@umn.edu wrote:
Andrew,
Thanks for investigating this. I changed the subject and sent to the galaxy dev list.
I've had a number of tools quit working recently. Particularly tools that inspect the extra_files_path when setting metadata, Defuse, Rsem, SnpEff.
I think there was a change in the galaxy framework: The extra_files_path when referenced from an input or output in the cheetah template sections of the tool config xml will be relative to the job working directly rather than the files location. I've just changed a few of my tools on my server yesterday from: <param_name>.extra_files_path to: <param_name>.dataset.extra_files_path and they now work again.
Dan or John, is that the right way to handle this? Thanks,
JJ
On 10/13/14, 9:29 PM, Andrew Lonie wrote:
Hi Jim. I am probably going about this the wrong way, but I am not clear on how to report tool errors (if in fact this is a tool error!)
I've been trialling your snpeff wrapper from the test toolshed and getting a consistent error with the SnpEff Download and SnpEff sub tools (the SnpSift dbNSFP works fine). The problem seems to be with an attribute declaration and manifests during database download as:
Traceback (most recent call last): File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", line 564, in finish_job job_state.job_wrapper.finish( stdout, stderr, exit_code ) File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line 1107, in finish dataset.datatype.set_meta( dataset, overwrite=False ) # call datatype.set_meta directly for the initial set_meta call during dataset creation File "/mnt/galaxy/shed_tools/testtoolshed.g2.bx.psu.edu/repos/iuc/snpeff/1938721334b3/snpeff/lib/galaxy/datatypes/snpeff.py", line 21, in set_meta data_dir = dataset.files_path AttributeError: 'HistoryDatasetAssociation' object has no attribute 'files_path'
We fiddled around with the wrapper, eventually replacing 'dataset.files_path' with 'dataset.extra_files_path' in snpeff.py, which fixed the download bug, but then SnpEff subtool itself threw a similar error when I tried to use that database from the history.
I chased up a bit more but cannot understand the various posts on files_path vs extra_files_path
I've shared a history with both of these errors here: http://130.56.251.62/galaxy/u/alonie/h/unnamed-history
Maybe this is a problem with our Galaxy image?
Any help appreciated!
Andrew
A/Prof Andrew Lonie University of Melbourne
-- James E. Johnson Minnesota Supercomputing Institute University of Minnesota ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hey JJ,
Opened a pull request to stable with my best guess at the right to proceed and hopefully a best practice recommendation we can all get behind. Do you want to try it out and let me know if it fixes snpeff? (It does fix the velvet datatypes you contributed to Galaxy.)
https://bitbucket.org/galaxy/galaxy-central/pull-request/532/fix-for-datatyp...
Dan, Bjoern - does this make sense - can we move forward with this approach ($input.extra_files_path for inputs and $output.files_path for outputs) as the best practices for how to reference these directories.
-John
On Wed, Oct 15, 2014 at 11:44 AM, Jim Johnson johns198@umn.edu wrote:
I agree with you about the inadvisable use of: input.dataset.*.
I'm looking at:
lib/galaxy/model/__init__.py class Dataset( object ): ... def __init__( self, id=None, state=None, external_filename=None, extra_files_path=None, file_size=None, purgable=True, uuid=None ): ... self._extra_files_path = extra_files_path ... @property def extra_files_path( self ): return self.object_store.get_filename( self, dir_only=True, extra_dir=self._extra_files_path or "dataset_%d_files" % self.id )
I'm trying to see when self._extra_files_path gets set. Otherwise, would this return the path relative to the current file location of dataset?
On 10/15/14, 9:36 AM, John Chilton wrote:
Okay - so this is what broke things:
https://bitbucket.org/galaxy/galaxy-central/commits/d781366bc120787e201b73a4...
My feeling with the commit was that wrappers and tools should never be explicitly accessing paths explicitly through input.dataset.*. I think this would circumvent options like outputs_to_working_directory and break remote job execution through Pulsar. It also breaks the object store abstraction I think - which is why I made the change for Bjoern I guess.
I did not (and this was stupid on my part) realize that datatype code would be running on the remote host and accessing these model properties directly outside the abstractions setup by the wrappers supplied to cheetah code and so they have become out of sync as of that commit.
I am thinking somehow changing what the datatype code gets is the right approach and not fixing things by circumvent the wrapper and accessing properties directly on the dataset. Since you will find that doing this breaks things for Bjoern object store and could probably never run on usegalaxy.org say for the same reason.
Too many different competing deployment options all being incompatible with each other :(.
Will keep thinking about this and respond again.
-John
On Wed, Oct 15, 2014 at 9:39 AM, John Chilton jmchilton@gmail.com wrote:
JJ,
Arg this is a mess. I am very sorry about this - I still don't understand extra_files_path versus files_path myself. There are open questions on Peter's blast repo and no one ever followed up on my object store questions about this with Bjoern's issues a couple release cycles ago. We need to get these to work - write documetation explicitly declaring best practices we can all agree on and then write some tests to ensure things don't break in the future.
When you say your tools broke recently - can you say for certain which release broke these - the August14, October14, something older?
I'll try to do some more research and get back to you.
-John
On Tue, Oct 14, 2014 at 6:04 AM, Jim Johnson johns198@umn.edu wrote:
Andrew,
Thanks for investigating this. I changed the subject and sent to the galaxy dev list.
I've had a number of tools quit working recently. Particularly tools that inspect the extra_files_path when setting metadata, Defuse, Rsem, SnpEff.
I think there was a change in the galaxy framework: The extra_files_path when referenced from an input or output in the cheetah template sections of the tool config xml will be relative to the job working directly rather than the files location. I've just changed a few of my tools on my server yesterday from: <param_name>.extra_files_path to: <param_name>.dataset.extra_files_path and they now work again.
Dan or John, is that the right way to handle this? Thanks,
JJ
On 10/13/14, 9:29 PM, Andrew Lonie wrote:
Hi Jim. I am probably going about this the wrong way, but I am not clear on how to report tool errors (if in fact this is a tool error!)
I've been trialling your snpeff wrapper from the test toolshed and getting a consistent error with the SnpEff Download and SnpEff sub tools (the SnpSift dbNSFP works fine). The problem seems to be with an attribute declaration and manifests during database download as:
Traceback (most recent call last): File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", line 564, in finish_job job_state.job_wrapper.finish( stdout, stderr, exit_code ) File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line 1107, in finish dataset.datatype.set_meta( dataset, overwrite=False ) # call datatype.set_meta directly for the initial set_meta call during dataset creation File
"/mnt/galaxy/shed_tools/testtoolshed.g2.bx.psu.edu/repos/iuc/snpeff/1938721334b3/snpeff/lib/galaxy/datatypes/snpeff.py", line 21, in set_meta data_dir = dataset.files_path AttributeError: 'HistoryDatasetAssociation' object has no attribute 'files_path'
We fiddled around with the wrapper, eventually replacing 'dataset.files_path' with 'dataset.extra_files_path' in snpeff.py, which fixed the download bug, but then SnpEff subtool itself threw a similar error when I tried to use that database from the history.
I chased up a bit more but cannot understand the various posts on files_path vs extra_files_path
I've shared a history with both of these errors here: http://130.56.251.62/galaxy/u/alonie/h/unnamed-history
Maybe this is a problem with our Galaxy image?
Any help appreciated!
Andrew
A/Prof Andrew Lonie University of Melbourne
-- James E. Johnson Minnesota Supercomputing Institute University of Minnesota ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- James E. Johnson Minnesota Supercomputing Institute University of Minnesota
Hi John,
glad to see this gets some attention!
Am 15.10.2014 um 19:05 schrieb John Chilton:
Hey JJ,
Opened a pull request to stable with my best guess at the right to proceed and hopefully a best practice recommendation we can all get behind. Do you want to try it out and let me know if it fixes snpeff? (It does fix the velvet datatypes you contributed to Galaxy.)
https://bitbucket.org/galaxy/galaxy-central/pull-request/532/fix-for-datatyp...
Dan, Bjoern - does this make sense - can we move forward with this approach ($input.extra_files_path for inputs and $output.files_path for outputs) as the best practices for how to reference these directories.
I'm not sure why we need this distinction? Can we not simply choose one for both, inputs and outputs? Otherwise we need to explain it very well, why this is needed and I would vote to rename it to reflect that files_path can be only used by $outputs ...
Salve, Bjoern
-John
On Wed, Oct 15, 2014 at 11:44 AM, Jim Johnson johns198@umn.edu wrote:
I agree with you about the inadvisable use of: input.dataset.*.
I'm looking at:
lib/galaxy/model/__init__.py class Dataset( object ): ... def __init__( self, id=None, state=None, external_filename=None, extra_files_path=None, file_size=None, purgable=True, uuid=None ): ... self._extra_files_path = extra_files_path ... @property def extra_files_path( self ): return self.object_store.get_filename( self, dir_only=True, extra_dir=self._extra_files_path or "dataset_%d_files" % self.id )
I'm trying to see when self._extra_files_path gets set. Otherwise, would this return the path relative to the current file location of dataset?
On 10/15/14, 9:36 AM, John Chilton wrote:
Okay - so this is what broke things:
https://bitbucket.org/galaxy/galaxy-central/commits/d781366bc120787e201b73a4...
My feeling with the commit was that wrappers and tools should never be explicitly accessing paths explicitly through input.dataset.*. I think this would circumvent options like outputs_to_working_directory and break remote job execution through Pulsar. It also breaks the object store abstraction I think - which is why I made the change for Bjoern I guess.
I did not (and this was stupid on my part) realize that datatype code would be running on the remote host and accessing these model properties directly outside the abstractions setup by the wrappers supplied to cheetah code and so they have become out of sync as of that commit.
I am thinking somehow changing what the datatype code gets is the right approach and not fixing things by circumvent the wrapper and accessing properties directly on the dataset. Since you will find that doing this breaks things for Bjoern object store and could probably never run on usegalaxy.org say for the same reason.
Too many different competing deployment options all being incompatible with each other :(.
Will keep thinking about this and respond again.
-John
On Wed, Oct 15, 2014 at 9:39 AM, John Chilton jmchilton@gmail.com wrote:
JJ,
Arg this is a mess. I am very sorry about this - I still don't understand extra_files_path versus files_path myself. There are open questions on Peter's blast repo and no one ever followed up on my object store questions about this with Bjoern's issues a couple release cycles ago. We need to get these to work - write documetation explicitly declaring best practices we can all agree on and then write some tests to ensure things don't break in the future.
When you say your tools broke recently - can you say for certain which release broke these - the August14, October14, something older?
I'll try to do some more research and get back to you.
-John
On Tue, Oct 14, 2014 at 6:04 AM, Jim Johnson johns198@umn.edu wrote:
Andrew,
Thanks for investigating this. I changed the subject and sent to the galaxy dev list.
I've had a number of tools quit working recently. Particularly tools that inspect the extra_files_path when setting metadata, Defuse, Rsem, SnpEff.
I think there was a change in the galaxy framework: The extra_files_path when referenced from an input or output in the cheetah template sections of the tool config xml will be relative to the job working directly rather than the files location. I've just changed a few of my tools on my server yesterday from: <param_name>.extra_files_path to: <param_name>.dataset.extra_files_path and they now work again.
Dan or John, is that the right way to handle this? Thanks,
JJ
On 10/13/14, 9:29 PM, Andrew Lonie wrote:
Hi Jim. I am probably going about this the wrong way, but I am not clear on how to report tool errors (if in fact this is a tool error!)
I've been trialling your snpeff wrapper from the test toolshed and getting a consistent error with the SnpEff Download and SnpEff sub tools (the SnpSift dbNSFP works fine). The problem seems to be with an attribute declaration and manifests during database download as:
Traceback (most recent call last): File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", line 564, in finish_job job_state.job_wrapper.finish( stdout, stderr, exit_code ) File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line 1107, in finish dataset.datatype.set_meta( dataset, overwrite=False ) # call datatype.set_meta directly for the initial set_meta call during dataset creation File
"/mnt/galaxy/shed_tools/testtoolshed.g2.bx.psu.edu/repos/iuc/snpeff/1938721334b3/snpeff/lib/galaxy/datatypes/snpeff.py", line 21, in set_meta data_dir = dataset.files_path AttributeError: 'HistoryDatasetAssociation' object has no attribute 'files_path'
We fiddled around with the wrapper, eventually replacing 'dataset.files_path' with 'dataset.extra_files_path' in snpeff.py, which fixed the download bug, but then SnpEff subtool itself threw a similar error when I tried to use that database from the history.
I chased up a bit more but cannot understand the various posts on files_path vs extra_files_path
I've shared a history with both of these errors here: http://130.56.251.62/galaxy/u/alonie/h/unnamed-history
Maybe this is a problem with our Galaxy image?
Any help appreciated!
Andrew
A/Prof Andrew Lonie University of Melbourne
-- James E. Johnson Minnesota Supercomputing Institute University of Minnesota ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- James E. Johnson Minnesota Supercomputing Institute University of Minnesota
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Wed, Oct 15, 2014 at 1:47 PM, Björn Grüning bjoern.gruening@gmail.com wrote:
Hi John,
glad to see this gets some attention!
Am 15.10.2014 um 19:05 schrieb John Chilton:
Hey JJ,
Opened a pull request to stable with my best guess at the right to proceed and hopefully a best practice recommendation we can all get behind. Do you want to try it out and let me know if it fixes snpeff? (It does fix the velvet datatypes you contributed to Galaxy.)
https://bitbucket.org/galaxy/galaxy-central/pull-request/532/fix-for-datatyp...
Dan, Bjoern - does this make sense - can we move forward with this approach ($input.extra_files_path for inputs and $output.files_path for outputs) as the best practices for how to reference these directories.
I'm not sure why we need this distinction? Can we not simply choose one for both, inputs and outputs? Otherwise we need to explain it very well, why this is needed and I would vote to rename it to reflect that files_path can be only used by $outputs ...
I sympathize with you that this adds complexity - I really do. But if we do anything else we restrict the range of Galaxy versions these tools can target even further - and we still have to maintain backward compatibility on all of this junk anyway which is really weighing down the wrapper and now metadata code as well.
If you want input.files_path to work - that is fine - I wouldn't be eager for the change given the complexity it would add to the implementation but I would probably accept a pull request for that. If you want $input.input_files_path and $output.output_files_path to work - I would probably accept pull requests for those as well but I would not be excited. Finally, I don't personally really want to put the time in given my reservations and the benefits would not be so great I don't think because I would think it would be awhile before we could really recommend those as best practices anyway - given the range of Galaxy versions people run.
How about we reach an agreement that with a fictitious Tool 2.0 spec (https://trello.com/c/AWVobyv1) where we fix all the problems we will not grant access to $input.dataset directly and we will uniformly only allow $input.files_path and $output.files_path.
-John
Salve, Bjoern
-John
On Wed, Oct 15, 2014 at 11:44 AM, Jim Johnson johns198@umn.edu wrote:
I agree with you about the inadvisable use of: input.dataset.*.
I'm looking at:
lib/galaxy/model/__init__.py class Dataset( object ): ... def __init__( self, id=None, state=None, external_filename=None, extra_files_path=None, file_size=None, purgable=True, uuid=None ): ... self._extra_files_path = extra_files_path ... @property def extra_files_path( self ): return self.object_store.get_filename( self, dir_only=True, extra_dir=self._extra_files_path or "dataset_%d_files" % self.id )
I'm trying to see when self._extra_files_path gets set. Otherwise, would this return the path relative to the current file location of dataset?
On 10/15/14, 9:36 AM, John Chilton wrote:
Okay - so this is what broke things:
https://bitbucket.org/galaxy/galaxy-central/commits/d781366bc120787e201b73a4...
My feeling with the commit was that wrappers and tools should never be explicitly accessing paths explicitly through input.dataset.*. I think this would circumvent options like outputs_to_working_directory and break remote job execution through Pulsar. It also breaks the object store abstraction I think - which is why I made the change for Bjoern I guess.
I did not (and this was stupid on my part) realize that datatype code would be running on the remote host and accessing these model properties directly outside the abstractions setup by the wrappers supplied to cheetah code and so they have become out of sync as of that commit.
I am thinking somehow changing what the datatype code gets is the right approach and not fixing things by circumvent the wrapper and accessing properties directly on the dataset. Since you will find that doing this breaks things for Bjoern object store and could probably never run on usegalaxy.org say for the same reason.
Too many different competing deployment options all being incompatible with each other :(.
Will keep thinking about this and respond again.
-John
On Wed, Oct 15, 2014 at 9:39 AM, John Chilton jmchilton@gmail.com wrote:
JJ,
Arg this is a mess. I am very sorry about this - I still don't understand extra_files_path versus files_path myself. There are open questions on Peter's blast repo and no one ever followed up on my object store questions about this with Bjoern's issues a couple release cycles ago. We need to get these to work - write documetation explicitly declaring best practices we can all agree on and then write some tests to ensure things don't break in the future.
When you say your tools broke recently - can you say for certain which release broke these - the August14, October14, something older?
I'll try to do some more research and get back to you.
-John
On Tue, Oct 14, 2014 at 6:04 AM, Jim Johnson johns198@umn.edu wrote:
Andrew,
Thanks for investigating this. I changed the subject and sent to the galaxy dev list.
I've had a number of tools quit working recently. Particularly tools that inspect the extra_files_path when setting metadata, Defuse, Rsem, SnpEff.
I think there was a change in the galaxy framework: The extra_files_path when referenced from an input or output in the cheetah template sections of the tool config xml will be relative to the job working directly rather than the files location. I've just changed a few of my tools on my server yesterday from: <param_name>.extra_files_path to: <param_name>.dataset.extra_files_path and they now work again.
Dan or John, is that the right way to handle this? Thanks,
JJ
On 10/13/14, 9:29 PM, Andrew Lonie wrote: > > Hi Jim. I am probably going about this the wrong way, but I am not > clear on how to report tool errors (if in fact this is a tool error!) > > I've been trialling your snpeff wrapper from the test toolshed and > getting a consistent error with the SnpEff Download and SnpEff sub > tools (the SnpSift dbNSFP works fine). The problem seems to be with an > attribute declaration and manifests during database download as: > > Traceback (most recent call last): > File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", > line 564, in finish_job > job_state.job_wrapper.finish( stdout, stderr, exit_code ) > File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line > 1107, in finish > dataset.datatype.set_meta( dataset, overwrite=False ) # call > datatype.set_meta directly for the initial set_meta call during > dataset creation > File > > "/mnt/galaxy/shed_tools/testtoolshed.g2.bx.psu.edu/repos/iuc/snpeff/1938721334b3/snpeff/lib/galaxy/datatypes/snpeff.py", > line 21, in set_meta > data_dir = dataset.files_path > AttributeError: 'HistoryDatasetAssociation' object has no attribute > 'files_path' > > > We fiddled around with the wrapper, eventually replacing > 'dataset.files_path' with 'dataset.extra_files_path' in snpeff.py, > which fixed the download bug, but then SnpEff subtool itself threw a > similar error when I tried to use that database from the history. > > I chased up a bit more but cannot understand the various posts on > files_path vs extra_files_path > > I've shared a history with both of these errors here: > http://130.56.251.62/galaxy/u/alonie/h/unnamed-history > > Maybe this is a problem with our Galaxy image? > > Any help appreciated! > > Andrew > > > > > A/Prof Andrew Lonie > University of Melbourne
-- James E. Johnson Minnesota Supercomputing Institute University of Minnesota ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- James E. Johnson Minnesota Supercomputing Institute University of Minnesota
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Looks good, John.
I tested with: https://testtoolshed.g2.bx.psu.edu/view/jjohnson/snpsift_dbnsfp_datatypes
lib/galaxy/datatypes/converters/tabular_to_dbnsfp.xml
reverting from hack: <command interpreter="python">tabular_to_dbnsfp.py $input $dbnsfp.dataset.extra_files_path/dbNSFP.gz</command> back to: <command interpreter="python">tabular_to_dbnsfp.py $input $dbnsfp.files_path/dbNSFP.gz</command>
On 10/15/14, 12:05 PM, John Chilton wrote:
Hey JJ,
Opened a pull request to stable with my best guess at the right to proceed and hopefully a best practice recommendation we can all get behind. Do you want to try it out and let me know if it fixes snpeff? (It does fix the velvet datatypes you contributed to Galaxy.)
https://bitbucket.org/galaxy/galaxy-central/pull-request/532/fix-for-datatyp...
Dan, Bjoern - does this make sense - can we move forward with this approach ($input.extra_files_path for inputs and $output.files_path for outputs) as the best practices for how to reference these directories.
-John
On Wed, Oct 15, 2014 at 11:44 AM, Jim Johnson johns198@umn.edu wrote:
I agree with you about the inadvisable use of: input.dataset.*.
I'm looking at:
lib/galaxy/model/__init__.py class Dataset( object ): ... def __init__( self, id=None, state=None, external_filename=None, extra_files_path=None, file_size=None, purgable=True, uuid=None ): ... self._extra_files_path = extra_files_path ... @property def extra_files_path( self ): return self.object_store.get_filename( self, dir_only=True, extra_dir=self._extra_files_path or "dataset_%d_files" % self.id )
I'm trying to see when self._extra_files_path gets set. Otherwise, would this return the path relative to the current file location of dataset?
On 10/15/14, 9:36 AM, John Chilton wrote:
Okay - so this is what broke things:
https://bitbucket.org/galaxy/galaxy-central/commits/d781366bc120787e201b73a4...
My feeling with the commit was that wrappers and tools should never be explicitly accessing paths explicitly through input.dataset.*. I think this would circumvent options like outputs_to_working_directory and break remote job execution through Pulsar. It also breaks the object store abstraction I think - which is why I made the change for Bjoern I guess.
I did not (and this was stupid on my part) realize that datatype code would be running on the remote host and accessing these model properties directly outside the abstractions setup by the wrappers supplied to cheetah code and so they have become out of sync as of that commit.
I am thinking somehow changing what the datatype code gets is the right approach and not fixing things by circumvent the wrapper and accessing properties directly on the dataset. Since you will find that doing this breaks things for Bjoern object store and could probably never run on usegalaxy.org say for the same reason.
Too many different competing deployment options all being incompatible with each other :(.
Will keep thinking about this and respond again.
-John
On Wed, Oct 15, 2014 at 9:39 AM, John Chilton jmchilton@gmail.com wrote:
JJ,
Arg this is a mess. I am very sorry about this - I still don't understand extra_files_path versus files_path myself. There are open questions on Peter's blast repo and no one ever followed up on my object store questions about this with Bjoern's issues a couple release cycles ago. We need to get these to work - write documetation explicitly declaring best practices we can all agree on and then write some tests to ensure things don't break in the future.
When you say your tools broke recently - can you say for certain which release broke these - the August14, October14, something older?
I'll try to do some more research and get back to you.
-John
On Tue, Oct 14, 2014 at 6:04 AM, Jim Johnson johns198@umn.edu wrote:
Andrew,
Thanks for investigating this. I changed the subject and sent to the galaxy dev list.
I've had a number of tools quit working recently. Particularly tools that inspect the extra_files_path when setting metadata, Defuse, Rsem, SnpEff.
I think there was a change in the galaxy framework: The extra_files_path when referenced from an input or output in the cheetah template sections of the tool config xml will be relative to the job working directly rather than the files location. I've just changed a few of my tools on my server yesterday from: <param_name>.extra_files_path to: <param_name>.dataset.extra_files_path and they now work again.
Dan or John, is that the right way to handle this? Thanks,
JJ
On 10/13/14, 9:29 PM, Andrew Lonie wrote:
Hi Jim. I am probably going about this the wrong way, but I am not clear on how to report tool errors (if in fact this is a tool error!)
I've been trialling your snpeff wrapper from the test toolshed and getting a consistent error with the SnpEff Download and SnpEff sub tools (the SnpSift dbNSFP works fine). The problem seems to be with an attribute declaration and manifests during database download as:
Traceback (most recent call last): File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", line 564, in finish_job job_state.job_wrapper.finish( stdout, stderr, exit_code ) File "/mnt/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line 1107, in finish dataset.datatype.set_meta( dataset, overwrite=False ) # call datatype.set_meta directly for the initial set_meta call during dataset creation File
"/mnt/galaxy/shed_tools/testtoolshed.g2.bx.psu.edu/repos/iuc/snpeff/1938721334b3/snpeff/lib/galaxy/datatypes/snpeff.py", line 21, in set_meta data_dir = dataset.files_path AttributeError: 'HistoryDatasetAssociation' object has no attribute 'files_path'
We fiddled around with the wrapper, eventually replacing 'dataset.files_path' with 'dataset.extra_files_path' in snpeff.py, which fixed the download bug, but then SnpEff subtool itself threw a similar error when I tried to use that database from the history.
I chased up a bit more but cannot understand the various posts on files_path vs extra_files_path
I've shared a history with both of these errors here: http://130.56.251.62/galaxy/u/alonie/h/unnamed-history
Maybe this is a problem with our Galaxy image?
Any help appreciated!
Andrew
A/Prof Andrew Lonie University of Melbourne
-- James E. Johnson Minnesota Supercomputing Institute University of Minnesota ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- James E. Johnson Minnesota Supercomputing Institute University of Minnesota
galaxy-dev@lists.galaxyproject.org