writing datatypes

Eric Rasche

14 Jul 2014 14 Jul '14

8:31 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm trying to add a new datatype to my galaxy instance for genbank files, however I'm running into various issues. I've followed the tutorial (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes) however that example subclasses tabular, and I'd like to subclass Text as they're plain text files, and I'd like to be able to define a sniffer for them (not possible if your type=galaxy.datatypes.data:Text) I figured the call ought to be something like <datatype extension="gb" type="galaxy.datatypes.data:Genbank" subclass="True" /> however, everything I try fails with

...

Error importing datatype module galaxy.datatypes.data: 'module' object has no attribute 'Genbank'

To avoid this particular issue, I tried writing a separate datatype just for genbank files (type="galaxy.datatypes.genbank:Genbank"), however that fails with the same error:

...

galaxy.datatypes.registry ERROR 2014-07-14 13:23:23,100 Error importing datatype module galaxy.datatypes.genbank: 'module' object has no attribute 'genbank' Traceback (most recent call last): File "/home/hxr/work/galaxy-central/lib/galaxy/datatypes/registry.py", line 206, in load_datatypes module = getattr( module, mod ) AttributeError: 'module' object has no attribute 'genbank'

Here's my lib/galaxy/datatypes/genbank.py looks like:

...

import pkg_resources pkg_resources.require( "bx-python" ) import logging from galaxy.datatypes import data log = logging.getLogger(__name__)

class Genbank( data.Text ): file_ext = "gb"

def sniff( self, filename ): header = open(filename).read(5) return header == 'LOCUS'

To debug this, I've tried copying the tabular data type completely, removed all the classes other than Tabular, and renamed it "Genbank", however this fails too with the same error. Can anyone offer some insight? Cheers, Eric -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJTxCHwAAoJEMqDXdrsMcpVmbsQAJ3eFIhZtZmVP9LCz/F9Ywg/ 148NJZy4lmxZU0KScJlc8kVDCDSADXIHd0Db/kpJwuUKEX7zei9q2uXfO7sWl3yt yxrFEdtX/a5SMVsa6F5WZuKwBs0zfvfsnIUoraOgh6nXeJnr53l9mYeWaKB6bi3Z xAlgJG/kdIR1jRjAimuQf4vMjNgtDQPOmotYBQTytbhsV6/nRzGI8RZAYwQ7GnVs XYOWFyhzrBgALndVI3BjI21rbRqguhrqr2t7i0Ma7Pp2JmAnNjmUaq70NN3Rueh6 DvnTtxInM1dVOQY+Yam6MCMmAedV1cG+rNGdpP2l82MajQAsMtbXckBXXKcSgyTq WCFoLVURYO1tHkWyq4ikamfFDHtJp1DogBYhUiPMyRw+CV+3sOvr0U5DcyRdiDsJ Xcm3ygqYVLGwauNmuN3yGcQcnfypDOOeFs1lppbNe3lw0w3ikZN4Zmu1ec5s1ITK MEcgBrGYgZrKDRXkx53lnABGpv6mYflYpag7fguDNL8j0lh9beaaNmHr4tmeEcug VZ1b1EWoLMj/ikJ/vZcluiHPTSTheiAP8Ttvh1WAayq4rKwVtZygaI9IDauqqBQ1 Dgotes3vcomlTQXDUEZACyOZDxl7wbAUh0LZVaa2fYNIOoPNPOItUFSjf6YveF88 dLiw3ddVm+BFmczJzRpt =4m2j -----END PGP SIGNATURE-----

Show replies by date

Peter Cock

15 Jul 15 Jul

10:14 a.m.

Hi Eric There is already a genbank format in the EMBOSS datatypes (although there is talk of defining this and others in a set of smaller repositories defined as its dependencies for more modularity). Note it uses "genbank" not "gb" as the name! https://toolshed.g2.bx.psu.edu/view/devteam/emboss_datatypes However that doesn't answer your question :( Peter On Mon, Jul 14, 2014 at 7:31 PM, Eric Rasche <rasche.eric@yandex.ru> wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

I'm trying to add a new datatype to my galaxy instance for genbank files, however I'm running into various issues. I've followed the tutorial (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes)

however that example subclasses tabular, and I'd like to subclass Text as they're plain text files, and I'd like to be able to define a sniffer for them (not possible if your type=galaxy.datatypes.data:Text)

I figured the call ought to be something like

<datatype extension="gb" type="galaxy.datatypes.data:Genbank" subclass="True" />

however, everything I try fails with

...
Error importing datatype module galaxy.datatypes.data: 'module' object has no attribute 'Genbank'

To avoid this particular issue, I tried writing a separate datatype just for genbank files (type="galaxy.datatypes.genbank:Genbank"), however that fails with the same error:

...
galaxy.datatypes.registry ERROR 2014-07-14 13:23:23,100 Error importing datatype module galaxy.datatypes.genbank: 'module' object has no attribute 'genbank' Traceback (most recent call last): File "/home/hxr/work/galaxy-central/lib/galaxy/datatypes/registry.py", line 206, in load_datatypes module = getattr( module, mod ) AttributeError: 'module' object has no attribute 'genbank'

Here's my lib/galaxy/datatypes/genbank.py looks like:

...
import pkg_resources pkg_resources.require( "bx-python" ) import logging from galaxy.datatypes import data log = logging.getLogger(__name__)

class Genbank( data.Text ): file_ext = "gb"

def sniff( self, filename ): header = open(filename).read(5) return header == 'LOCUS'

To debug this, I've tried copying the tabular data type completely, removed all the classes other than Tabular, and renamed it "Genbank", however this fails too with the same error.

Can anyone offer some insight?

Cheers, Eric -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTxCHwAAoJEMqDXdrsMcpVmbsQAJ3eFIhZtZmVP9LCz/F9Ywg/ 148NJZy4lmxZU0KScJlc8kVDCDSADXIHd0Db/kpJwuUKEX7zei9q2uXfO7sWl3yt yxrFEdtX/a5SMVsa6F5WZuKwBs0zfvfsnIUoraOgh6nXeJnr53l9mYeWaKB6bi3Z xAlgJG/kdIR1jRjAimuQf4vMjNgtDQPOmotYBQTytbhsV6/nRzGI8RZAYwQ7GnVs XYOWFyhzrBgALndVI3BjI21rbRqguhrqr2t7i0Ma7Pp2JmAnNjmUaq70NN3Rueh6 DvnTtxInM1dVOQY+Yam6MCMmAedV1cG+rNGdpP2l82MajQAsMtbXckBXXKcSgyTq WCFoLVURYO1tHkWyq4ikamfFDHtJp1DogBYhUiPMyRw+CV+3sOvr0U5DcyRdiDsJ Xcm3ygqYVLGwauNmuN3yGcQcnfypDOOeFs1lppbNe3lw0w3ikZN4Zmu1ec5s1ITK MEcgBrGYgZrKDRXkx53lnABGpv6mYflYpag7fguDNL8j0lh9beaaNmHr4tmeEcug VZ1b1EWoLMj/ikJ/vZcluiHPTSTheiAP8Ttvh1WAayq4rKwVtZygaI9IDauqqBQ1 Dgotes3vcomlTQXDUEZACyOZDxl7wbAUh0LZVaa2fYNIOoPNPOItUFSjf6YveF88 dLiw3ddVm+BFmczJzRpt =4m2j -----END PGP SIGNATURE----- ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Eric Rasche

5:03 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Peter, I saw that in an initial search for genbank modules, however it didn't meet our requirements (lack of features/"heavy" by requiring all of emboss). And, you are correct, it doesn't fix the problem. Thanks for the suggestion. Cheers, Eric On 07/15/2014 03:14 AM, Peter Cock wrote:

...

Hi Eric

There is already a genbank format in the EMBOSS datatypes (although there is talk of defining this and others in a set of smaller repositories defined as its dependencies for more modularity). Note it uses "genbank" not "gb" as the name!

https://toolshed.g2.bx.psu.edu/view/devteam/emboss_datatypes

However that doesn't answer your question :(

Peter

On Mon, Jul 14, 2014 at 7:31 PM, Eric Rasche <rasche.eric@yandex.ru> wrote: I'm trying to add a new datatype to my galaxy instance for genbank files, however I'm running into various issues. I've followed the tutorial (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes)

however that example subclasses tabular, and I'd like to subclass Text as they're plain text files, and I'd like to be able to define a sniffer for them (not possible if your type=galaxy.datatypes.data:Text)

I figured the call ought to be something like

<datatype extension="gb" type="galaxy.datatypes.data:Genbank" subclass="True" />

however, everything I try fails with

...
...
...
Error importing datatype module galaxy.datatypes.data: 'module' object has no attribute 'Genbank'

To avoid this particular issue, I tried writing a separate datatype just for genbank files (type="galaxy.datatypes.genbank:Genbank"), however that fails with the same error:

...
...
...
galaxy.datatypes.registry ERROR 2014-07-14 13:23:23,100 Error importing datatype module galaxy.datatypes.genbank: 'module' object has no attribute 'genbank' Traceback (most recent call last): File "/home/hxr/work/galaxy-central/lib/galaxy/datatypes/registry.py", line 206, in load_datatypes module = getattr( module, mod ) AttributeError: 'module' object has no attribute 'genbank'

Here's my lib/galaxy/datatypes/genbank.py looks like:

...
...
...
import pkg_resources pkg_resources.require( "bx-python" ) import logging from galaxy.datatypes import data log = logging.getLogger(__name__)

class Genbank( data.Text ): file_ext = "gb"

def sniff( self, filename ): header = open(filename).read(5) return header == 'LOCUS'

To debug this, I've tried copying the tabular data type completely, removed all the classes other than Tabular, and renamed it "Genbank", however this fails too with the same error.

Can anyone offer some insight?

Cheers, Eric

...
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTxULLAAoJEMqDXdrsMcpVtsAP/j8i8rNcrJqgOCnYexD2dHoQ yn6JYRQRNziJrqhwVTuH1i47rFJXUoo2whaD4QKwSnrXg0iQSpSgiM74e+IKmOFQ lnqyQQP50YHMars3U9441T15GcSSpNEW1FwxtBIrIt76bV26BPx+YKqhukA76eQ8 e5X+HRPsFu8+jczL0zcAv5DGSmskoJz6wDc9jlaWbFu21mjPPZiY6FFdXZaBR/h2 AesD68P85d4sygzcE42BDuSUg2obPSiBA5DJ/CMWlUNDeZi4V6/KO/F2LmC2PAak rR9xSSS2HXryuqREzRX8Ny1jq6Y0v34zTjObwtWTExE2olTPqPxB0pvEsaoKFis7 KNEP9qLgOMTKjCTzrb1qRgQ5Iq5utNP0TyYEWGQKolpGA1L7updETFfQBw9PY2pu /w8EkRzd6zermy2cQFYRKgvR081R6jwngJV4UUG2FXH6+bFAK4knpQ1+fT0/2PoD qIxnB5bEUW00RiJRnKbMCWoepcl4CAQepLdgHa0ofYMNkPsZIi2mR6DBv49HRx9v P56TRNfXDYW0nyoFRkQKNlMafjWg8ykOUsHVAcC++uicCLebWWHrQWNMEsWQr7Qk QIg1YLhejYK1Lfiafqnu23xMat2TVS149w4bik9VNhvtIxImOvoXCpU5EpDCq2BG gCTFHSzb7/kS3yvj1EQQ =qop5 -----END PGP SIGNATURE-----

Björn Grüning

16 Jul 16 Jul

2:47 p.m.

Hi Eric, please have a look at: https://github.com/bgruening/galaxytools/blob/master/datatypes/msa_datatypes... You need somthing like: <datatype extension="genbank" type="galaxy.datatypes.data:Text" subclass="True" /> Lets try to split the EMBOSS datatypes a little bit into small chunks. E.g. sequences_datatypes, msa_datatypes ... and so on ... Cheers, Bjoern Am 14.07.2014 20:31, schrieb Eric Rasche:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

I'm trying to add a new datatype to my galaxy instance for genbank files, however I'm running into various issues. I've followed the tutorial (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes)

however that example subclasses tabular, and I'd like to subclass Text as they're plain text files, and I'd like to be able to define a sniffer for them (not possible if your type=galaxy.datatypes.data:Text)

I figured the call ought to be something like

<datatype extension="gb" type="galaxy.datatypes.data:Genbank" subclass="True" />

however, everything I try fails with

...
Error importing datatype module galaxy.datatypes.data: 'module' object has no attribute 'Genbank'

To avoid this particular issue, I tried writing a separate datatype just for genbank files (type="galaxy.datatypes.genbank:Genbank"), however that fails with the same error:

...
galaxy.datatypes.registry ERROR 2014-07-14 13:23:23,100 Error importing datatype module galaxy.datatypes.genbank: 'module' object has no attribute 'genbank' Traceback (most recent call last): File "/home/hxr/work/galaxy-central/lib/galaxy/datatypes/registry.py", line 206, in load_datatypes module = getattr( module, mod ) AttributeError: 'module' object has no attribute 'genbank'

Here's my lib/galaxy/datatypes/genbank.py looks like:

...
import pkg_resources pkg_resources.require( "bx-python" ) import logging from galaxy.datatypes import data log = logging.getLogger(__name__)

class Genbank( data.Text ): file_ext = "gb"

def sniff( self, filename ): header = open(filename).read(5) return header == 'LOCUS'

To debug this, I've tried copying the tabular data type completely, removed all the classes other than Tabular, and renamed it "Genbank", however this fails too with the same error.

Can anyone offer some insight?

Cheers, Eric -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTxCHwAAoJEMqDXdrsMcpVmbsQAJ3eFIhZtZmVP9LCz/F9Ywg/ 148NJZy4lmxZU0KScJlc8kVDCDSADXIHd0Db/kpJwuUKEX7zei9q2uXfO7sWl3yt yxrFEdtX/a5SMVsa6F5WZuKwBs0zfvfsnIUoraOgh6nXeJnr53l9mYeWaKB6bi3Z xAlgJG/kdIR1jRjAimuQf4vMjNgtDQPOmotYBQTytbhsV6/nRzGI8RZAYwQ7GnVs XYOWFyhzrBgALndVI3BjI21rbRqguhrqr2t7i0Ma7Pp2JmAnNjmUaq70NN3Rueh6 DvnTtxInM1dVOQY+Yam6MCMmAedV1cG+rNGdpP2l82MajQAsMtbXckBXXKcSgyTq WCFoLVURYO1tHkWyq4ikamfFDHtJp1DogBYhUiPMyRw+CV+3sOvr0U5DcyRdiDsJ Xcm3ygqYVLGwauNmuN3yGcQcnfypDOOeFs1lppbNe3lw0w3ikZN4Zmu1ec5s1ITK MEcgBrGYgZrKDRXkx53lnABGpv6mYflYpag7fguDNL8j0lh9beaaNmHr4tmeEcug VZ1b1EWoLMj/ikJ/vZcluiHPTSTheiAP8Ttvh1WAayq4rKwVtZygaI9IDauqqBQ1 Dgotes3vcomlTQXDUEZACyOZDxl7wbAUh0LZVaa2fYNIOoPNPOItUFSjf6YveF88 dLiw3ddVm+BFmczJzRpt =4m2j -----END PGP SIGNATURE----- ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Peter Cock

3:34 p.m.

Indeed - ideally (once working) we can upload under the IUC ToolShed as a community maintained resource rather than under a personal account which becomes a single point of failure (the bus factor). We (the ICU) have previously discussed doing this so that the EMBOSS datatypes could become more of a meta-entry depending on other smaller specific datatype defining ToolShed repositories. But it hasn't reached the top of my personal TODO list yet ;) Peter On Wed, Jul 16, 2014 at 1:47 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...

Hi Eric,

please have a look at:

https://github.com/bgruening/galaxytools/blob/master/datatypes/msa_datatypes...

You need somthing like: <datatype extension="genbank" type="galaxy.datatypes.data:Text" subclass="True" />

Lets try to split the EMBOSS datatypes a little bit into small chunks. E.g. sequences_datatypes, msa_datatypes ... and so on ...

Cheers, Bjoern

Am 14.07.2014 20:31, schrieb Eric Rasche:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

I'm trying to add a new datatype to my galaxy instance for genbank files, however I'm running into various issues. I've followed the tutorial (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes)

however that example subclasses tabular, and I'd like to subclass Text as they're plain text files, and I'd like to be able to define a sniffer for them (not possible if your type=galaxy.datatypes.data:Text)

I figured the call ought to be something like

<datatype extension="gb" type="galaxy.datatypes.data:Genbank" subclass="True" />

however, everything I try fails with

...
Error importing datatype module galaxy.datatypes.data: 'module' object has no attribute 'Genbank'

To avoid this particular issue, I tried writing a separate datatype just for genbank files (type="galaxy.datatypes.genbank:Genbank"), however that fails with the same error:

...
galaxy.datatypes.registry ERROR 2014-07-14 13:23:23,100 Error importing datatype module galaxy.datatypes.genbank: 'module' object has no attribute 'genbank' Traceback (most recent call last): File "/home/hxr/work/galaxy-central/lib/galaxy/datatypes/registry.py", line 206, in load_datatypes module = getattr( module, mod ) AttributeError: 'module' object has no attribute 'genbank'

Here's my lib/galaxy/datatypes/genbank.py looks like:

...
import pkg_resources pkg_resources.require( "bx-python" ) import logging from galaxy.datatypes import data log = logging.getLogger(__name__)

class Genbank( data.Text ): file_ext = "gb"

def sniff( self, filename ): header = open(filename).read(5) return header == 'LOCUS'

To debug this, I've tried copying the tabular data type completely, removed all the classes other than Tabular, and renamed it "Genbank", however this fails too with the same error.

Can anyone offer some insight?

Cheers, Eric -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTxCHwAAoJEMqDXdrsMcpVmbsQAJ3eFIhZtZmVP9LCz/F9Ywg/ 148NJZy4lmxZU0KScJlc8kVDCDSADXIHd0Db/kpJwuUKEX7zei9q2uXfO7sWl3yt yxrFEdtX/a5SMVsa6F5WZuKwBs0zfvfsnIUoraOgh6nXeJnr53l9mYeWaKB6bi3Z xAlgJG/kdIR1jRjAimuQf4vMjNgtDQPOmotYBQTytbhsV6/nRzGI8RZAYwQ7GnVs XYOWFyhzrBgALndVI3BjI21rbRqguhrqr2t7i0Ma7Pp2JmAnNjmUaq70NN3Rueh6 DvnTtxInM1dVOQY+Yam6MCMmAedV1cG+rNGdpP2l82MajQAsMtbXckBXXKcSgyTq WCFoLVURYO1tHkWyq4ikamfFDHtJp1DogBYhUiPMyRw+CV+3sOvr0U5DcyRdiDsJ Xcm3ygqYVLGwauNmuN3yGcQcnfypDOOeFs1lppbNe3lw0w3ikZN4Zmu1ec5s1ITK MEcgBrGYgZrKDRXkx53lnABGpv6mYflYpag7fguDNL8j0lh9beaaNmHr4tmeEcug VZ1b1EWoLMj/ikJ/vZcluiHPTSTheiAP8Ttvh1WAayq4rKwVtZygaI9IDauqqBQ1 Dgotes3vcomlTQXDUEZACyOZDxl7wbAUh0LZVaa2fYNIOoPNPOItUFSjf6YveF88 dLiw3ddVm+BFmczJzRpt =4m2j -----END PGP SIGNATURE----- ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Eric Rasche

4:02 p.m.

Forgive me, I'm not 100% clear on the custom plugin system used by galaxy, but if I "subclass" from the text data type, will sniffers I implement override text's and function? The lack of being able to add an entry to the sniffer section (unlike with the tabular example) led me to believe my genbank datatype wouldn't be sniffed. Additionally, I'd still like to be able to add completely new datatypes, do you know of any working examples of this? As mentioned in my original post, duplicating an existing datatype and changing names on it surprisingly doesn't work. I'd be lovely to have the emboss datatypes split out. Cheers, Eric On July 16, 2014 8:34:55 AM CDT, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...

Indeed - ideally (once working) we can upload under the IUC ToolShed as a community maintained resource rather than under a personal account which becomes a single point of failure (the bus factor).

We (the ICU) have previously discussed doing this so that the EMBOSS datatypes could become more of a meta-entry depending on other smaller specific datatype defining ToolShed repositories. But it hasn't reached the top of my personal TODO list yet ;)

Peter

On Wed, Jul 16, 2014 at 1:47 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Hi Eric,

please have a look at:

https://github.com/bgruening/galaxytools/blob/master/datatypes/msa_datatypes...

...
You need somthing like: <datatype extension="genbank" type="galaxy.datatypes.data:Text" subclass="True" />

Lets try to split the EMBOSS datatypes a little bit into small

chunks. E.g.

...
sequences_datatypes, msa_datatypes ... and so on ...

Cheers, Bjoern

Am 14.07.2014 20:31, schrieb Eric Rasche:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

I'm trying to add a new datatype to my galaxy instance for genbank files, however I'm running into various issues. I've followed the tutorial (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes)

however that example subclasses tabular, and I'd like to subclass Text as they're plain text files, and I'd like to be able to define a sniffer for them (not possible if your type=galaxy.datatypes.data:Text)

I figured the call ought to be something like

<datatype extension="gb" type="galaxy.datatypes.data:Genbank" subclass="True" />

however, everything I try fails with

...
Error importing datatype module galaxy.datatypes.data: 'module' object has no attribute 'Genbank'

To avoid this particular issue, I tried writing a separate datatype just for genbank files (type="galaxy.datatypes.genbank:Genbank"), however that fails with the same error:

...
galaxy.datatypes.registry ERROR 2014-07-14 13:23:23,100 Error importing datatype module galaxy.datatypes.genbank: 'module' object has no attribute 'genbank' Traceback (most recent call last): File "/home/hxr/work/galaxy-central/lib/galaxy/datatypes/registry.py", line 206, in load_datatypes module = getattr( module, mod ) AttributeError: 'module' object has no attribute 'genbank'

Here's my lib/galaxy/datatypes/genbank.py looks like:

...
import pkg_resources pkg_resources.require( "bx-python" ) import logging from galaxy.datatypes import data log = logging.getLogger(__name__)

class Genbank( data.Text ): file_ext = "gb"

def sniff( self, filename ): header = open(filename).read(5) return header == 'LOCUS'

To debug this, I've tried copying the tabular data type completely, removed all the classes other than Tabular, and renamed it "Genbank", however this fails too with the same error.

Can anyone offer some insight?

Cheers, Eric -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTxCHwAAoJEMqDXdrsMcpVmbsQAJ3eFIhZtZmVP9LCz/F9Ywg/ 148NJZy4lmxZU0KScJlc8kVDCDSADXIHd0Db/kpJwuUKEX7zei9q2uXfO7sWl3yt yxrFEdtX/a5SMVsa6F5WZuKwBs0zfvfsnIUoraOgh6nXeJnr53l9mYeWaKB6bi3Z xAlgJG/kdIR1jRjAimuQf4vMjNgtDQPOmotYBQTytbhsV6/nRzGI8RZAYwQ7GnVs XYOWFyhzrBgALndVI3BjI21rbRqguhrqr2t7i0Ma7Pp2JmAnNjmUaq70NN3Rueh6 DvnTtxInM1dVOQY+Yam6MCMmAedV1cG+rNGdpP2l82MajQAsMtbXckBXXKcSgyTq WCFoLVURYO1tHkWyq4ikamfFDHtJp1DogBYhUiPMyRw+CV+3sOvr0U5DcyRdiDsJ Xcm3ygqYVLGwauNmuN3yGcQcnfypDOOeFs1lppbNe3lw0w3ikZN4Zmu1ec5s1ITK MEcgBrGYgZrKDRXkx53lnABGpv6mYflYpag7fguDNL8j0lh9beaaNmHr4tmeEcug VZ1b1EWoLMj/ikJ/vZcluiHPTSTheiAP8Ttvh1WAayq4rKwVtZygaI9IDauqqBQ1 Dgotes3vcomlTQXDUEZACyOZDxl7wbAUh0LZVaa2fYNIOoPNPOItUFSjf6YveF88 dLiw3ddVm+BFmczJzRpt =4m2j -----END PGP SIGNATURE----- ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Björn Grüning

4:20 p.m.

Hi Eric,

...

Forgive me, I'm not 100% clear on the custom plugin system used by galaxy, but if I "subclass" from the text data type, will sniffers I implement override text's and function? The lack of being able to add an entry to the sniffer section (unlike with the tabular example) led me to believe my genbank datatype wouldn't be sniffed.

Thats true, if you want to override functions, you need to subclass it on a python level not on the XML level.

...

Additionally, I'd still like to be able to add completely new datatypes, do you know of any working examples of this? As mentioned in my original post, duplicating an existing datatype and changing names on it surprisingly doesn't work.

https://github.com/bgruening/galaxytools/tree/master/datatypes/msa_datatypes https://github.com/bgruening/galaxytools/blob/master/chemicaltoolbox/datatyp... Is that enough, to get started?

...

I'd be lovely to have the emboss datatypes split out.

Ok, than lets start :) I will try to fork emboss into my galaxytools/datatypes repository and try to split them. You will get commit access and can improve your genbank datatype (and a few more ;)). Finally, we will talk to the devteam to rewrite EMBOSS to depend on our separate data type repositories. OK? Ciao, Bjoenr

...

Cheers, Eric

On July 16, 2014 8:34:55 AM CDT, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...
Indeed - ideally (once working) we can upload under the IUC ToolShed as a community maintained resource rather than under a personal account which becomes a single point of failure (the bus factor).

We (the ICU) have previously discussed doing this so that the EMBOSS datatypes could become more of a meta-entry depending on other smaller specific datatype defining ToolShed repositories. But it hasn't reached the top of my personal TODO list yet ;)

Peter

On Wed, Jul 16, 2014 at 1:47 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Hi Eric,

please have a look at:

https://github.com/bgruening/galaxytools/blob/master/datatypes/msa_datatypes...

...
You need somthing like: <datatype extension="genbank" type="galaxy.datatypes.data:Text" subclass="True" />

Lets try to split the EMBOSS datatypes a little bit into small

chunks. E.g.

...
sequences_datatypes, msa_datatypes ... and so on ...

Cheers, Bjoern

Am 14.07.2014 20:31, schrieb Eric Rasche:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

I'm trying to add a new datatype to my galaxy instance for genbank files, however I'm running into various issues. I've followed the tutorial (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes)

however that example subclasses tabular, and I'd like to subclass Text as they're plain text files, and I'd like to be able to define a sniffer for them (not possible if your type=galaxy.datatypes.data:Text)

I figured the call ought to be something like

<datatype extension="gb" type="galaxy.datatypes.data:Genbank" subclass="True" />

however, everything I try fails with

...
Error importing datatype module galaxy.datatypes.data: 'module' object has no attribute 'Genbank'

To avoid this particular issue, I tried writing a separate datatype just for genbank files (type="galaxy.datatypes.genbank:Genbank"), however that fails with the same error:

...
galaxy.datatypes.registry ERROR 2014-07-14 13:23:23,100 Error importing datatype module galaxy.datatypes.genbank: 'module' object has no attribute 'genbank' Traceback (most recent call last): File "/home/hxr/work/galaxy-central/lib/galaxy/datatypes/registry.py", line 206, in load_datatypes module = getattr( module, mod ) AttributeError: 'module' object has no attribute 'genbank'

Here's my lib/galaxy/datatypes/genbank.py looks like:

...
import pkg_resources pkg_resources.require( "bx-python" ) import logging from galaxy.datatypes import data log = logging.getLogger(__name__)

class Genbank( data.Text ): file_ext = "gb"

def sniff( self, filename ): header = open(filename).read(5) return header == 'LOCUS'

To debug this, I've tried copying the tabular data type completely, removed all the classes other than Tabular, and renamed it "Genbank", however this fails too with the same error.

Can anyone offer some insight?

Cheers, Eric -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTxCHwAAoJEMqDXdrsMcpVmbsQAJ3eFIhZtZmVP9LCz/F9Ywg/ 148NJZy4lmxZU0KScJlc8kVDCDSADXIHd0Db/kpJwuUKEX7zei9q2uXfO7sWl3yt yxrFEdtX/a5SMVsa6F5WZuKwBs0zfvfsnIUoraOgh6nXeJnr53l9mYeWaKB6bi3Z xAlgJG/kdIR1jRjAimuQf4vMjNgtDQPOmotYBQTytbhsV6/nRzGI8RZAYwQ7GnVs XYOWFyhzrBgALndVI3BjI21rbRqguhrqr2t7i0Ma7Pp2JmAnNjmUaq70NN3Rueh6 DvnTtxInM1dVOQY+Yam6MCMmAedV1cG+rNGdpP2l82MajQAsMtbXckBXXKcSgyTq WCFoLVURYO1tHkWyq4ikamfFDHtJp1DogBYhUiPMyRw+CV+3sOvr0U5DcyRdiDsJ Xcm3ygqYVLGwauNmuN3yGcQcnfypDOOeFs1lppbNe3lw0w3ikZN4Zmu1ec5s1ITK MEcgBrGYgZrKDRXkx53lnABGpv6mYflYpag7fguDNL8j0lh9beaaNmHr4tmeEcug VZ1b1EWoLMj/ikJ/vZcluiHPTSTheiAP8Ttvh1WAayq4rKwVtZygaI9IDauqqBQ1 Dgotes3vcomlTQXDUEZACyOZDxl7wbAUh0LZVaa2fYNIOoPNPOItUFSjf6YveF88 dLiw3ddVm+BFmczJzRpt =4m2j -----END PGP SIGNATURE----- ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Eric Rasche

4:40 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Björn, On 07/16/2014 09:20 AM, Björn Grüning wrote:

...

Hi Eric,

...
Forgive me, I'm not 100% clear on the custom plugin system used by galaxy, but if I "subclass" from the text data type, will sniffers I implement override text's and function? The lack of being able to add an entry to the sniffer section (unlike with the tabular example) led me to believe my genbank datatype wouldn't be sniffed.

Thats true, if you want to override functions, you need to subclass it on a python level not on the XML level.

Okay, good, as I figured then.

...

...
Additionally, I'd still like to be able to add completely new datatypes, do you know of any working examples of this? As mentioned in my original post, duplicating an existing datatype and changing names on it surprisingly doesn't work.

https://github.com/bgruening/galaxytools/tree/master/datatypes/msa_datatypes

https://github.com/bgruening/galaxytools/blob/master/chemicaltoolbox/datatyp...

Those absolutely should be, thank you. I'll probably strip them down and post a minimal working example, and code to the wiki as well, for future reference.

...

Is that enough, to get started?

...
I'd be lovely to have the emboss datatypes split out.

Ok, than lets start :) I will try to fork emboss into my galaxytools/datatypes repository and try to split them. You will get commit access and can improve your genbank datatype (and a few more ;)). Finally, we will talk to the devteam to rewrite EMBOSS to depend on our separate data type repositories. OK?

Ja, sounds good! Happy to help.

...

Ciao, Bjoenr

...
Cheers, Eric

On July 16, 2014 8:34:55 AM CDT, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...
Indeed - ideally (once working) we can upload under the IUC ToolShed as a community maintained resource rather than under a personal account which becomes a single point of failure (the bus factor).

We (the ICU) have previously discussed doing this so that the EMBOSS datatypes could become more of a meta-entry depending on other smaller specific datatype defining ToolShed repositories. But it hasn't reached the top of my personal TODO list yet ;)

Peter

On Wed, Jul 16, 2014 at 1:47 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Hi Eric,

please have a look at:

https://github.com/bgruening/galaxytools/blob/master/datatypes/msa_datatypes...

...
You need somthing like: <datatype extension="genbank" type="galaxy.datatypes.data:Text" subclass="True" />

Lets try to split the EMBOSS datatypes a little bit into small

chunks. E.g.

...
sequences_datatypes, msa_datatypes ... and so on ...

Cheers, Bjoern

Am 14.07.2014 20:31, schrieb Eric Rasche:

I'm trying to add a new datatype to my galaxy instance for genbank files, however I'm running into various issues. I've followed the tutorial (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes)

however that example subclasses tabular, and I'd like to subclass

...
...
...
Text as they're plain text files, and I'd like to be able to define a sniffer for them (not possible if your type=galaxy.datatypes.data:Text)

I figured the call ought to be something like

<datatype extension="gb" type="galaxy.datatypes.data:Genbank" subclass="True" />

however, everything I try fails with

...
...
...
...
...
> Error importing datatype module galaxy.datatypes.data: 'module' object > has no attribute 'Genbank'

...
...
...
just for genbank files (type="galaxy.datatypes.genbank:Genbank"), however

To avoid this particular issue, I tried writing a separate datatype that fails with the same error:

...
...
...
...
...
> galaxy.datatypes.registry ERROR 2014-07-14 13:23:23,100 Error importing > datatype module galaxy.datatypes.genbank: 'module' object has no attribute > 'genbank' > Traceback (most recent call last): > File "/home/hxr/work/galaxy-central/lib/galaxy/datatypes/registry.py", > line 206, in load_datatypes > module = getattr( module, mod ) > AttributeError: 'module' object has no attribute 'genbank'

Here's my lib/galaxy/datatypes/genbank.py looks like:

...
...
...
...
...
> import pkg_resources > pkg_resources.require( "bx-python" ) > import logging > from galaxy.datatypes import data > log = logging.getLogger(__name__) > > class Genbank( data.Text ): > file_ext = "gb" > > def sniff( self, filename ): > header = open(filename).read(5) > return header == 'LOCUS'

To debug this, I've tried copying the tabular data type completely, removed all the classes other than Tabular, and renamed it

...
...
...
"Genbank", however this fails too with the same error.

Can anyone offer some insight?

Cheers, Eric

...
...
...
...
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

- -- Eric Rasche Programmer II Center for Phage Technology Texas A&M University College Station, TX 77843 404-692-2048 esr@tamu.edu rasche.eric@yandex.ru -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJTxo7pAAoJEMqDXdrsMcpVKnsP/3Oqaux9jdJobAuCL4py5wW9 Kbxe5Io5ua8ZhrDUj4qeQsYzNPt9bHKuYWK1VHz+Jf8ZPZDW6a/hHTf6mTqfXrzX mDUHtf9j6xEW+ye1JTL7QsOImxl7JRDvI0MVQOkQt8C8QZTWu+pjXLMrVd/QygGL AntX/1ngmEYDKxwPAagD+P1bUxwNalZ96FE9qIubL5GLjFn7yuG6fBE98/40UM27 x4bHdRB+svUTjCiH/E7MKZGN2OEL8H6QHOnl7rfA70Z2SezNr7Ivzgb+pVStyVum QF2/g8C0dUooYaM2hZhosOM6mLKSNH2NAIsRumfIQMDoZBMxlQE6iD0550tORvSf 1MP1T2B0jqPckca30udDZ7qtksB0u/QJgLFunZ26uQZE/B/jKoL/FjNsNIFgxCto EdIab40rH7ysnFjbiLV8AiSjgDV0V8VCDjxNbBZRjwy34RP2ZN4Ggew+vyGVv3iH 28biFIhbxVWmYccDYGWVYaqw2wdUdk4l9j0OvlCaqHH3fXsPyhpAWivmVzwc3kGz /Wyj3KYMEySiJ++Wkw1H29QH0wyEpbh89dX2ULQjqJV7qibWUM7KZtFiizEr55tP QI0KhJzGe/NQ4CzP1szl/mbxjBUr3cRGXF1dausVu1hKgjhb2QJBJKw9nFvbTnKw BKOkNRQ/vkr3aHL1IrVE =AP7s -----END PGP SIGNATURE-----

John Chilton

10:52 p.m.

Is this going to work? I get that this would be a better design if done from the beginning, but what happens if you install an emboss repository upgrade (on an existing install) that brings in conflicting types from other repositories that already exist and have been previously installed? Does the tool shed have a mechanism to handle that? -John On Wed, Jul 16, 2014 at 9:20 AM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...

Hi Eric,

...
Forgive me, I'm not 100% clear on the custom plugin system used by galaxy, but if I "subclass" from the text data type, will sniffers I implement override text's and function? The lack of being able to add an entry to the sniffer section (unlike with the tabular example) led me to believe my genbank datatype wouldn't be sniffed.

Thats true, if you want to override functions, you need to subclass it on a python level not on the XML level.

...
Additionally, I'd still like to be able to add completely new datatypes, do you know of any working examples of this? As mentioned in my original post, duplicating an existing datatype and changing names on it surprisingly doesn't work.

https://github.com/bgruening/galaxytools/tree/master/datatypes/msa_datatypes https://github.com/bgruening/galaxytools/blob/master/chemicaltoolbox/datatyp...

Is that enough, to get started?

...
I'd be lovely to have the emboss datatypes split out.

Ok, than lets start :) I will try to fork emboss into my galaxytools/datatypes repository and try to split them. You will get commit access and can improve your genbank datatype (and a few more ;)). Finally, we will talk to the devteam to rewrite EMBOSS to depend on our separate data type repositories. OK?

Ciao, Bjoenr

...
Cheers, Eric

On July 16, 2014 8:34:55 AM CDT, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...
Indeed - ideally (once working) we can upload under the IUC ToolShed as a community maintained resource rather than under a personal account which becomes a single point of failure (the bus factor).

We (the ICU) have previously discussed doing this so that the EMBOSS datatypes could become more of a meta-entry depending on other smaller specific datatype defining ToolShed repositories. But it hasn't reached the top of my personal TODO list yet ;)

Peter

On Wed, Jul 16, 2014 at 1:47 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Hi Eric,

please have a look at:

https://github.com/bgruening/galaxytools/blob/master/datatypes/msa_datatypes...

...
You need somthing like: <datatype extension="genbank" type="galaxy.datatypes.data:Text" subclass="True" />

Lets try to split the EMBOSS datatypes a little bit into small

chunks. E.g.

...
sequences_datatypes, msa_datatypes ... and so on ...

Cheers, Bjoern

Am 14.07.2014 20:31, schrieb Eric Rasche:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

I'm trying to add a new datatype to my galaxy instance for genbank files, however I'm running into various issues. I've followed the tutorial (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes)

however that example subclasses tabular, and I'd like to subclass

Text

...
...
as they're plain text files, and I'd like to be able to define a

sniffer

...
...
for them (not possible if your type=galaxy.datatypes.data:Text)

I figured the call ought to be something like

<datatype extension="gb" type="galaxy.datatypes.data:Genbank" subclass="True" />

however, everything I try fails with

...
Error importing datatype module galaxy.datatypes.data: 'module'

object

...
...
...
has no attribute 'Genbank'

To avoid this particular issue, I tried writing a separate datatype

just

...
...
for genbank files (type="galaxy.datatypes.genbank:Genbank"), however that fails with the same error:

...
galaxy.datatypes.registry ERROR 2014-07-14 13:23:23,100 Error

importing

...
...
...
datatype module galaxy.datatypes.genbank: 'module' object has no

attribute

...
...
...
'genbank' Traceback (most recent call last): File

"/home/hxr/work/galaxy-central/lib/galaxy/datatypes/registry.py",

...
...
...
line 206, in load_datatypes module = getattr( module, mod ) AttributeError: 'module' object has no attribute 'genbank'

Here's my lib/galaxy/datatypes/genbank.py looks like:

...
import pkg_resources pkg_resources.require( "bx-python" ) import logging from galaxy.datatypes import data log = logging.getLogger(__name__)

class Genbank( data.Text ): file_ext = "gb"

def sniff( self, filename ): header = open(filename).read(5) return header == 'LOCUS'

To debug this, I've tried copying the tabular data type completely, removed all the classes other than Tabular, and renamed it

"Genbank",

...
...
however this fails too with the same error.

Can anyone offer some insight?

Cheers, Eric -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTxCHwAAoJEMqDXdrsMcpVmbsQAJ3eFIhZtZmVP9LCz/F9Ywg/ 148NJZy4lmxZU0KScJlc8kVDCDSADXIHd0Db/kpJwuUKEX7zei9q2uXfO7sWl3yt yxrFEdtX/a5SMVsa6F5WZuKwBs0zfvfsnIUoraOgh6nXeJnr53l9mYeWaKB6bi3Z xAlgJG/kdIR1jRjAimuQf4vMjNgtDQPOmotYBQTytbhsV6/nRzGI8RZAYwQ7GnVs XYOWFyhzrBgALndVI3BjI21rbRqguhrqr2t7i0Ma7Pp2JmAnNjmUaq70NN3Rueh6 DvnTtxInM1dVOQY+Yam6MCMmAedV1cG+rNGdpP2l82MajQAsMtbXckBXXKcSgyTq WCFoLVURYO1tHkWyq4ikamfFDHtJp1DogBYhUiPMyRw+CV+3sOvr0U5DcyRdiDsJ Xcm3ygqYVLGwauNmuN3yGcQcnfypDOOeFs1lppbNe3lw0w3ikZN4Zmu1ec5s1ITK MEcgBrGYgZrKDRXkx53lnABGpv6mYflYpag7fguDNL8j0lh9beaaNmHr4tmeEcug VZ1b1EWoLMj/ikJ/vZcluiHPTSTheiAP8Ttvh1WAayq4rKwVtZygaI9IDauqqBQ1 Dgotes3vcomlTQXDUEZACyOZDxl7wbAUh0LZVaa2fYNIOoPNPOItUFSjf6YveF88 dLiw3ddVm+BFmczJzRpt =4m2j -----END PGP SIGNATURE----- ___________________________________________________________

Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

___________________________________________________________

Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

___________________________________________________________

Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Greg Von Kuster

17 Jul 17 Jul

12:42 a.m.

Assuming this comment:

...

...
Finally, we will talk to the devteam to rewrite EMBOSS to depend on our separate data type repositories.

refers to the emboss_5 repository owned by devteam, then what is being proposed should work (although I may not be fully understanding what is being proposed). If the emboss datatypes are split up and the emboss_5 repository's repository_dependencies.xml file is altered, then a new installable revision of the emboss_5 repository will be created. This implies that any previous installation cannot be updated to include the split up emboss datatypes. Instead, a new installation of the emboss_5 repository will be required. This new installation may depend on emboss datatypes that conflict with those in the older emboss_5 installation, and the 2nd version of the conflicting datatypes will not be loaded into the Galaxy datatypes registry. However, if the datatypes are the same, this shouldn't be a problem since the 1st version will have been loaded. On Jul 16, 2014, at 4:52 PM, John Chilton <jmchilton@gmail.com> wrote:

...

Is this going to work? I get that this would be a better design if done from the beginning, but what happens if you install an emboss repository upgrade (on an existing install) that brings in conflicting types from other repositories that already exist and have been previously installed? Does the tool shed have a mechanism to handle that?

-John

On Wed, Jul 16, 2014 at 9:20 AM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Hi Eric,

...
Forgive me, I'm not 100% clear on the custom plugin system used by galaxy, but if I "subclass" from the text data type, will sniffers I implement override text's and function? The lack of being able to add an entry to the sniffer section (unlike with the tabular example) led me to believe my genbank datatype wouldn't be sniffed.

Thats true, if you want to override functions, you need to subclass it on a python level not on the XML level.

...
Additionally, I'd still like to be able to add completely new datatypes, do you know of any working examples of this? As mentioned in my original post, duplicating an existing datatype and changing names on it surprisingly doesn't work.

https://github.com/bgruening/galaxytools/tree/master/datatypes/msa_datatypes https://github.com/bgruening/galaxytools/blob/master/chemicaltoolbox/datatyp...

Is that enough, to get started?

...
I'd be lovely to have the emboss datatypes split out.

Ok, than lets start :) I will try to fork emboss into my galaxytools/datatypes repository and try to split them. You will get commit access and can improve your genbank datatype (and a few more ;)). Finally, we will talk to the devteam to rewrite EMBOSS to depend on our separate data type repositories. OK?

Ciao, Bjoenr

...
Cheers, Eric

On July 16, 2014 8:34:55 AM CDT, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...
Indeed - ideally (once working) we can upload under the IUC ToolShed as a community maintained resource rather than under a personal account which becomes a single point of failure (the bus factor).

We (the ICU) have previously discussed doing this so that the EMBOSS datatypes could become more of a meta-entry depending on other smaller specific datatype defining ToolShed repositories. But it hasn't reached the top of my personal TODO list yet ;)

Peter

On Wed, Jul 16, 2014 at 1:47 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Hi Eric,

please have a look at:

https://github.com/bgruening/galaxytools/blob/master/datatypes/msa_datatypes...

...
You need somthing like: <datatype extension="genbank" type="galaxy.datatypes.data:Text" subclass="True" />

Lets try to split the EMBOSS datatypes a little bit into small

chunks. E.g.

...
sequences_datatypes, msa_datatypes ... and so on ...

Cheers, Bjoern

Am 14.07.2014 20:31, schrieb Eric Rasche:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

I'm trying to add a new datatype to my galaxy instance for genbank files, however I'm running into various issues. I've followed the tutorial (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes)

however that example subclasses tabular, and I'd like to subclass

Text

...
...
as they're plain text files, and I'd like to be able to define a

sniffer

...
...
for them (not possible if your type=galaxy.datatypes.data:Text)

I figured the call ought to be something like

<datatype extension="gb" type="galaxy.datatypes.data:Genbank" subclass="True" />

however, everything I try fails with

> Error importing datatype module galaxy.datatypes.data: 'module'

object

...
...
> > has no attribute 'Genbank'

To avoid this particular issue, I tried writing a separate datatype

just

...
...
for genbank files (type="galaxy.datatypes.genbank:Genbank"), however that fails with the same error:

> galaxy.datatypes.registry ERROR 2014-07-14 13:23:23,100 Error

importing

...
...
> > datatype module galaxy.datatypes.genbank: 'module' object has no

attribute

...
...
> > 'genbank' > Traceback (most recent call last): > File

"/home/hxr/work/galaxy-central/lib/galaxy/datatypes/registry.py",

...
...
> > line 206, in load_datatypes > module = getattr( module, mod ) > AttributeError: 'module' object has no attribute 'genbank'

Here's my lib/galaxy/datatypes/genbank.py looks like:

> import pkg_resources > pkg_resources.require( "bx-python" ) > import logging > from galaxy.datatypes import data > log = logging.getLogger(__name__) > > class Genbank( data.Text ): > file_ext = "gb" > > def sniff( self, filename ): > header = open(filename).read(5) > return header == 'LOCUS'

To debug this, I've tried copying the tabular data type completely, removed all the classes other than Tabular, and renamed it

"Genbank",

...
...
however this fails too with the same error.

Can anyone offer some insight?

Cheers, Eric -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTxCHwAAoJEMqDXdrsMcpVmbsQAJ3eFIhZtZmVP9LCz/F9Ywg/ 148NJZy4lmxZU0KScJlc8kVDCDSADXIHd0Db/kpJwuUKEX7zei9q2uXfO7sWl3yt yxrFEdtX/a5SMVsa6F5WZuKwBs0zfvfsnIUoraOgh6nXeJnr53l9mYeWaKB6bi3Z xAlgJG/kdIR1jRjAimuQf4vMjNgtDQPOmotYBQTytbhsV6/nRzGI8RZAYwQ7GnVs XYOWFyhzrBgALndVI3BjI21rbRqguhrqr2t7i0Ma7Pp2JmAnNjmUaq70NN3Rueh6 DvnTtxInM1dVOQY+Yam6MCMmAedV1cG+rNGdpP2l82MajQAsMtbXckBXXKcSgyTq WCFoLVURYO1tHkWyq4ikamfFDHtJp1DogBYhUiPMyRw+CV+3sOvr0U5DcyRdiDsJ Xcm3ygqYVLGwauNmuN3yGcQcnfypDOOeFs1lppbNe3lw0w3ikZN4Zmu1ec5s1ITK MEcgBrGYgZrKDRXkx53lnABGpv6mYflYpag7fguDNL8j0lh9beaaNmHr4tmeEcug VZ1b1EWoLMj/ikJ/vZcluiHPTSTheiAP8Ttvh1WAayq4rKwVtZygaI9IDauqqBQ1 Dgotes3vcomlTQXDUEZACyOZDxl7wbAUh0LZVaa2fYNIOoPNPOItUFSjf6YveF88 dLiw3ddVm+BFmczJzRpt =4m2j -----END PGP SIGNATURE----- ___________________________________________________________

Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

___________________________________________________________

Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

___________________________________________________________

Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

John Chilton

6:24 p.m.

Even more out of office than normal so maybe I don't have the throughput to process this but it sounds like it won't work then. If the new types aren't going to be loaded than we cannot evolve the datatypes with new functionality in new repositories. Perhaps I am missing something, but in the abstract it seems Galaxy would have no way of knowing which types are new and which types are old in this scenario. Not really my business so feel free to proceed however obviously - just letting you know that it makes me nervous. I will try to find the time to understand how the tool shed handles data types so I can speak with less ignorance in the future. -John

Björn Grüning

6:45 p.m.

Hi, I think you are right John. Datatypes have many issues in that regard as I can tell, from a few bug reports. Imho datatypes should be handled like "Tool dependency definitions". There should be only one "installable revsion". But that aside, emboss datatypes are already broken. For example asn1 was added into Galaxy but it still exists in emboss_datatypes. Moreover, howto add a proper genbank datatype with sniffer, split and merge functions? Ideally, every datatype should have its own repository, but that is an overhead I would like to omit ... any other ideas? I would love to discuss that issue further, maybe a hangout with Greg and Peter? Thanks John for your input, Bjoern Am 17.07.2014 18:24, schrieb John Chilton:

...

Even more out of office than normal so maybe I don't have the throughput to process this but it sounds like it won't work then. If the new types aren't going to be loaded than we cannot evolve the datatypes with new functionality in new repositories.

Perhaps I am missing something, but in the abstract it seems Galaxy would have no way of knowing which types are new and which types are old in this scenario.

Not really my business so feel free to proceed however obviously - just letting you know that it makes me nervous. I will try to find the time to understand how the tool shed handles data types so I can speak with less ignorance in the future.

-John

Peter Cock

6:51 p.m.

On Thu, Jul 17, 2014 at 5:45 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...

Hi,

I think you are right John. Datatypes have many issues in that regard as I can tell, from a few bug reports. Imho datatypes should be handled like "Tool dependency definitions". There should be only one "installable revsion".

But that aside, emboss datatypes are already broken. For example asn1 was added into Galaxy but it still exists in emboss_datatypes.

Moreover, howto add a proper genbank datatype with sniffer, split and merge functions? Ideally, every datatype should have its own repository, but that is an overhead I would like to omit ... any other ideas?

I would love to discuss that issue further, maybe a hangout with Greg and Peter?

Thanks John for your input, Bjoern

This could be high level, e.g. "other sequence file formats" repository covering GenBank, EMBL, SwissProt plain text, UniProt XML, etc; one for multiple sequence alignments; one for EMBOSS' own output... But it wouldn't be that much more work to do one ToolShed repo per additional file format, would it? One reason I have been meaning to do some of these is familiarity with many of these formats from looking after/writing parsers in Biopython. Having this done sooner rather than later ought to head off too many incompatible datatype names which worries me. Is it too late to adopt something like the EDAM ontology for the datatypes within Galaxy? Peter

Björn Grüning

7:10 p.m.

Am 17.07.2014 18:51, schrieb Peter Cock:

...

On Thu, Jul 17, 2014 at 5:45 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Hi,

I think you are right John. Datatypes have many issues in that regard as I can tell, from a few bug reports. Imho datatypes should be handled like "Tool dependency definitions". There should be only one "installable revsion".

But that aside, emboss datatypes are already broken. For example asn1 was added into Galaxy but it still exists in emboss_datatypes.

Moreover, howto add a proper genbank datatype with sniffer, split and merge functions? Ideally, every datatype should have its own repository, but that is an overhead I would like to omit ... any other ideas?

I would love to discuss that issue further, maybe a hangout with Greg and Peter?

Thanks John for your input, Bjoern

This could be high level, e.g. "other sequence file formats" repository covering GenBank, EMBL, SwissProt plain text, UniProt XML, etc; one for multiple sequence alignments; one for EMBOSS' own output...

That was my initial idea. Starting point is here: https://github.com/bgruening/galaxytools/tree/master/datatypes

...

But it wouldn't be that much more work to do one ToolShed repo per additional file format, would it?

Uploading and creating descriptions in the toolshed will take most of the time :) Lets see if I can use a train trip to do that ... but the problem will stay the same ... one repository can have multiple versions ...

...

One reason I have been meaning to do some of these is familiarity with many of these formats from looking after/writing parsers in Biopython.

Having this done sooner rather than later ought to head off too many incompatible datatype names which worries me. Is it too late to adopt something like the EDAM ontology for the datatypes within Galaxy?

Peter

Eric Rasche

7:28 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 07/17/2014 12:10 PM, Björn Grüning wrote:

...

Am 17.07.2014 18:51, schrieb Peter Cock:

...
On Thu, Jul 17, 2014 at 5:45 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Hi,

I think you are right John. Datatypes have many issues in that regard as I can tell, from a few bug reports. Imho datatypes should be handled like "Tool dependency definitions". There should be only one "installable revsion".

But that aside, emboss datatypes are already broken. For example asn1 was added into Galaxy but it still exists in emboss_datatypes.

Moreover, howto add a proper genbank datatype with sniffer, split and merge functions? Ideally, every datatype should have its own repository, but that is an overhead I would like to omit ... any other ideas?

We could use something like what I do, CI scripts and hidden .yaml files to manage which folders get pushed to which toolshed repositories and when. My initial version of that blindly updates things when there are changes, but I'm working to add support for things like "create a new versioned toolshed repository on major version # changes". That would remove a lot of the overhead for maintaining that many repositories.

...

...
...
I would love to discuss that issue further, maybe a hangout with Greg and Peter?

Thanks John for your input, Bjoern

This could be high level, e.g. "other sequence file formats" repository covering GenBank, EMBL, SwissProt plain text, UniProt XML, etc; one for multiple sequence alignments; one for EMBOSS' own output...

That was my initial idea. Starting point is here: https://github.com/bgruening/galaxytools/tree/master/datatypes

...
But it wouldn't be that much more work to do one ToolShed repo per additional file format, would it?

Uploading and creating descriptions in the toolshed will take most of the time :) Lets see if I can use a train trip to do that ... but the problem will stay the same ... one repository can have multiple versions ...

And how to solve that? You're right, datatypes shouldn't have multiple revisions since the file format should not be changing. I don't have an answer for this either unfortunately :/

...

...
One reason I have been meaning to do some of these is familiarity with many of these formats from looking after/writing parsers in Biopython.

Peter, similar case here with BioPerl. All of my tools can output the full range of Bio::SeqIO output formats, so having datatypes would be great. Happy to contribute there.

...

...
Having this done sooner rather than later ought to head off too many incompatible datatype names which worries me. Is it too late to adopt something like the EDAM ontology for the datatypes within Galaxy?

Peter

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

- -- Eric Rasche Programmer II Center for Phage Technology Texas A&M University College Station, TX 77843 404-692-2048 esr@tamu.edu rasche.eric@yandex.ru -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJTyAfbAAoJEMqDXdrsMcpVcFUQAJMQMyZ7eDM3fDhppOHjPgxU 16hpuQ14MW2UqZsAl4V0H8R+1C1xnBIH1rErUPfvaloEAVk6FWogDY5L79XHz6b5 6G7UkDM+7K+zKb6pDyVynm8Kx5Kg+D7gHtu0R2HTFxYGRhVbuldskKJfp9g8aziP NPVALTLUi+hotzsNSJpP8rBct6WYWNNIM3o1TIKLVVsQfrhlTfYXuYF8Xb0n8GTs Tf3ad6ZIY7BJTftGdlzE0O3ZPgXe5J/cb9RCyzTN69R6uKUIhg1XaOGHlA+JubbG 161e9fiuNzFF54bmQZYCIZTR9YBPF7aRjRQJcRVjBvTaQ3NbTmUdzvhW1fLT9Yuv 8WPVKIyB0lWECVx85fuSGE1PH7rwJZATO0bkHgsxqUT2TI7TFy0HWl6hJaPolP5/ 1u3uvvsBu4aDiBK9uI+fzkqn+fu4D+A8GwllL0sOsyNcDlbjBUXWfYA0xVI41+m1 PFeQ6MRHf332kY/iqhnX5GJfzQIp0KHmEwpDTzwa9SkDSnZm7SLhZi46vFZpQAgR AvBObz8ztstZP9yRwNF1cXYIap+tFQ0vKa9uqNTeC3sTWwypsK5SKl1jCfHUI71T saxqNuML+G+uJiVPaFmeh19eVrHAPSR1oQLYl0fC2X4Qt9Jw2/Tgj8cEl08Cj3NO LAMs0NIOwRhkJ556uA/P =JeRi -----END PGP SIGNATURE-----

Peter Cock

7:31 p.m.

On Thu, Jul 17, 2014 at 6:28 PM, Eric Rasche <rasche.eric@yandex.ru> wrote:

...

...
Am 17.07.2014 18:51, schrieb Peter Cock:

...
One reason I have been meaning to do some of these is familiarity with many of these formats from looking after/writing parsers in Biopython.

Peter, similar case here with BioPerl. All of my tools can output the full range of Bio::SeqIO output formats, so having datatypes would be great. Happy to contribute there.

Sounds good. The EMBOSS, BioPerl and Biopython projects have tried to adopt consistent file format names (pre-dating the EDAM ontology), but unfortunately the names adopted in Galaxy sometimes diverge :( Peter

Eric Rasche

7:41 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 07/17/2014 12:31 PM, Peter Cock wrote:

...

On Thu, Jul 17, 2014 at 6:28 PM, Eric Rasche <rasche.eric@yandex.ru> wrote:

...
...
Am 17.07.2014 18:51, schrieb Peter Cock:

...
One reason I have been meaning to do some of these is familiarity with many of these formats from looking after/writing parsers in Biopython.

Peter, similar case here with BioPerl. All of my tools can output the full range of Bio::SeqIO output formats, so having datatypes would be great. Happy to contribute there.

Sounds good. The EMBOSS, BioPerl and Biopython projects have tried to adopt consistent file format names (pre-dating the EDAM ontology), but unfortunately the names adopted in Galaxy sometimes diverge :(

The EDAM format looks interesting, as does BioXSD for datatype conversion. Given that EDAM largely covers what's already in galaxy in terms of formats, it's be brilliant to have that over top of datatypes.

...

Peter

- -- Eric Rasche Programmer II Center for Phage Technology Texas A&M University College Station, TX 77843 404-692-2048 esr@tamu.edu rasche.eric@yandex.ru -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJTyArfAAoJEMqDXdrsMcpVHWoQALBCsoI120dZTqiGWxuLbgCo Ni3LBJ8iFLkhf6kGgTxMCBHNDp/RdLv9wLvH6hZ7HCMrVlHJhs0b2lrEQWbXPeKr W6hcWn+8zcAx0Ezr118mD1vgBVL1kfSVkWteeXYToVKFnXjsgmAOLT4eOwWn3M6O PS5pHwqTd0eNbJ5lviR7hgHLNJ68u0NCwxPiKhjZaaXUPw69aBSp2GOfQ4mio8Wv UUAq4KPMB46iDprWOrrNXgg49FKXeX0cXnlwKNPbF9MkPPOci6WW/Ao0NUC9ZAUR BqL1XEZxo/0f/hh2YagoimT6bEjs3XGphGn5HrJ48/q8Eb5x7nt7izjWbbe/jY6n 2u2ioD74YD9txF7wqdcWSPEK7/GMJyRzMLt9xqnP8GBeVBElHIokDngURYj9Obpj bAt26/a0p5YAhgPkqoCueo7Lhclsk8kRHfA3moB+lV0liHbnKVGIRH4uu/x1G0ee UuZLEFS4qe+I8pZJEyMw6y4hQK36RjaBASeSrsXbauSBgpdhPCxvlyzQxtutu9cm gAPZ0g0CLrYYNK3sdKdaMaVYBl1WFce7n4Ac36i+uZX9TYujOVCX1YX7OdGnaEln rBLA7Epz8UWooWR7P0xa7U172cvHbNcYGkKQ7+nk/mMtr4DD4/0nyn3InlO1VAVL 2YsvF1U4eZpTR+2368e8 =Sdgc -----END PGP SIGNATURE-----

Peter Cock

7:30 p.m.

On Thu, Jul 17, 2014 at 6:10 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...

... but the problem will stay the same ... one [datatype definition] repository can have multiple versions ...

I like your idea that like tool dependency definitions, this should be a special repository type on the ToolShed: Earlier, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...

Imho datatypes should be handled like "Tool dependency definitions". There should be only one "installable revsion".

This is something Greg will have to comment on - there may be ramifications I'm not seeing. Peter

Greg Von Kuster

7:35 p.m.

This would be easy to implement, but could adversely affect reproducibility. If a repository containing datatypes always had only a single installable revision (i.e., the chagelog tip), then any datatypes defined in an early changeset revision that are removed in a later changeset revision would no longer be available. Greg On Jul 17, 2014, at 1:30 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...

On Thu, Jul 17, 2014 at 6:10 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
... but the problem will stay the same ... one [datatype definition] repository can have multiple versions ...

I like your idea that like tool dependency definitions, this should be a special repository type on the ToolShed:

Earlier, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Imho datatypes should be handled like "Tool dependency definitions". There should be only one "installable revsion".

This is something Greg will have to comment on - there may be ramifications I'm not seeing.

Peter

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Peter Cock

7:38 p.m.

Good point Greg. Let's refine this slightly then, a new special ToolShed repository type for a *single* datatype definition. That avoids this problem :) (This does not help with suites of very closely related datatypes - like different kinds of BLAST database.) Peter On Thu, Jul 17, 2014 at 6:35 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:

...

This would be easy to implement, but could adversely affect reproducibility. If a repository containing datatypes always had only a single installable revision (i.e., the chagelog tip), then any datatypes defined in an early changeset revision that are removed in a later changeset revision would no longer be available.

Greg

On Jul 17, 2014, at 1:30 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...
On Thu, Jul 17, 2014 at 6:10 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
... but the problem will stay the same ... one [datatype definition] repository can have multiple versions ...

I like your idea that like tool dependency definitions, this should be a special repository type on the ToolShed:

Earlier, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Imho datatypes should be handled like "Tool dependency definitions". There should be only one "installable revsion".

This is something Greg will have to comment on - there may be ramifications I'm not seeing.

Peter

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Greg Von Kuster

7:43 p.m.

Here's a Trello card for this: https://trello.com/c/s8tfbW4x On Jul 17, 2014, at 1:38 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...

Good point Greg.

Let's refine this slightly then, a new special ToolShed repository type for a *single* datatype definition. That avoids this problem :)

(This does not help with suites of very closely related datatypes - like different kinds of BLAST database.)

Peter

On Thu, Jul 17, 2014 at 6:35 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:

...
This would be easy to implement, but could adversely affect reproducibility. If a repository containing datatypes always had only a single installable revision (i.e., the chagelog tip), then any datatypes defined in an early changeset revision that are removed in a later changeset revision would no longer be available.

Greg

On Jul 17, 2014, at 1:30 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...
On Thu, Jul 17, 2014 at 6:10 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
... but the problem will stay the same ... one [datatype definition] repository can have multiple versions ...

I like your idea that like tool dependency definitions, this should be a special repository type on the ToolShed:

Earlier, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Imho datatypes should be handled like "Tool dependency definitions". There should be only one "installable revsion".

This is something Greg will have to comment on - there may be ramifications I'm not seeing.

Peter

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Björn Grüning

20 Jul 20 Jul

7:23 p.m.

Hi, single datatype definitions only work if you haven’t defined any converters. Let's assume I have a datatype X and want to ship a X -> Y converter (Y -> X is also possible), we will end up with a dependency loop, or? The X repository will depend on the Y repository, but Y is depending on X, because we want to include a Y -> X converter. Any idea how to solve that? How to handle versions of datatypes? Extra repositories for stockholm 1.0 and 1.1? If so ... the associated python file (sniffing, splitting ...) should be also versioned, or? What happend if I have two stockholm.py files in my system? @Peter, can we create a striped-down, python only biopython egg? All parsers should be included, Bio.SeqIO should be sufficient I think. Ciao, Bjoern Am 17.07.2014 19:43, schrieb Greg Von Kuster:

...

Here's a Trello card for this:

https://trello.com/c/s8tfbW4x

On Jul 17, 2014, at 1:38 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...
Good point Greg.

Let's refine this slightly then, a new special ToolShed repository type for a *single* datatype definition. That avoids this problem :)

(This does not help with suites of very closely related datatypes - like different kinds of BLAST database.)

Peter

On Thu, Jul 17, 2014 at 6:35 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:

...
This would be easy to implement, but could adversely affect reproducibility. If a repository containing datatypes always had only a single installable revision (i.e., the chagelog tip), then any datatypes defined in an early changeset revision that are removed in a later changeset revision would no longer be available.

Greg

On Jul 17, 2014, at 1:30 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...
On Thu, Jul 17, 2014 at 6:10 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
... but the problem will stay the same ... one [datatype definition] repository can have multiple versions ...

I like your idea that like tool dependency definitions, this should be a special repository type on the ToolShed:

Earlier, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Imho datatypes should be handled like "Tool dependency definitions". There should be only one "installable revsion".

This is something Greg will have to comment on - there may be ramifications I'm not seeing.

Peter

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Peter Cock

9:22 p.m.

On Sun, Jul 20, 2014 at 6:23 PM, Björn Grüning <bjoern@gruenings.eu> wrote:

...

Hi,

single datatype definitions only work if you haven’t defined any converters. Let's assume I have a datatype X and want to ship a X -> Y converter (Y -> X is also possible), we will end up with a dependency loop, or? The X repository will depend on the Y repository, but Y is depending on X, because we want to include a Y -> X converter.

Any idea how to solve that?

Excellent example!

...

How to handle versions of datatypes? Extra repositories for stockholm 1.0 and 1.1? If so ... the associated python file (sniffing, splitting ...) should be also versioned, or? What happend if I have two stockholm.py files in my system?

Potentially you might need/want to define those as two different Galaxy datatypes?

...

@Peter, can we create a striped-down, python only biopython egg? All parsers should be included, Bio.SeqIO should be sufficient I think.

Right now, yes in principle (and this is fine from the licence point of view), but in practise this is a fair chunk of work. However, we are looking at this - see https://github.com/biopython/biopython/issues/349 Peter

Greg Von Kuster

21 Jul 21 Jul

3:12 p.m.

Please see my comments below. On Jul 20, 2014, at 3:22 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...

On Sun, Jul 20, 2014 at 6:23 PM, Björn Grüning <bjoern@gruenings.eu> wrote:

...
Hi,

single datatype definitions only work if you haven’t defined any converters. Let's assume I have a datatype X and want to ship a X -> Y converter (Y -> X is also possible), we will end up with a dependency loop, or? The X repository will depend on the Y repository, but Y is depending on X, because we want to include a Y -> X converter.

Any idea how to solve that?

I don't see a problem here, so I'm hoping I'm correctly understanding the issue. If we have: repo_x contains the single datatype X repo_y contains the single datatype Y repo_x_to_y_converter contains a tool that converts datatype X to datatype Y (this repository also defines 2 dependency relationships, one to repo_x and another to repo_y) repo_y_to_x_cenverter contains a tool that converts datatype Y to datatype X (this repository also defines 2 dependency relationships, one to repo_x and another to repo_y) Now if we want to install both the repo_x_to_y_converter and the repo_y_to_x_cenverter automatically whenever either one is installed, we have 2 options: 1) define a 3rd dependency relationshiop for repo_x_to_y_converter to depend on repo_y_to_x_cenverter and, similarly a 3rd dependency relationshiop for repo_y_to_x_cenverter on repo_x_to_y_converter. This does indeed create a circular repository dependency relationship, but the Tool Shed installation process will handle it correctly, installing all 4 repositories with proper dependency relationships created between them 2) Instead of creating a circlular dependency relationship between repo_x_to_y_converter and repo_y_to_x_cenverter, create an additional suite_definition_x_y repository (of type "repository_suite_definition" that defines relationships to repo_x_to_y_converter and repo_y_to_x_cenverter, ultimately installing all 4 repositories, but without defining any circular dependency relationships. Either of the above 2 scenarios will correctly install the 4 repositories. Let me know if I'm missing something here. Thanks! Greg

...

Excellent example!

...
How to handle versions of datatypes? Extra repositories for stockholm 1.0 and 1.1? If so ... the associated python file (sniffing, splitting ...) should be also versioned, or? What happend if I have two stockholm.py files in my system?

Potentially you might need/want to define those as two different Galaxy datatypes?

...
@Peter, can we create a striped-down, python only biopython egg? All parsers should be included, Bio.SeqIO should be sufficient I think.

Right now, yes in principle (and this is fine from the licence point of view), but in practise this is a fair chunk of work. However, we are looking at this - see https://github.com/biopython/biopython/issues/349

Peter

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Björn Grüning

23 Jul 23 Jul

12:01 a.m.

New subject: [galaxy-iuc] writing datatypes

Hi Greg, thanks for the clarification. Please see my comments below.

...

On Jul 20, 2014, at 3:22 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...
On Sun, Jul 20, 2014 at 6:23 PM, Björn Grüning <bjoern@gruenings.eu> wrote:

...
Hi,

single datatype definitions only work if you haven’t defined any converters. Let's assume I have a datatype X and want to ship a X -> Y converter (Y -> X is also possible), we will end up with a dependency loop, or? The X repository will depend on the Y repository, but Y is depending on X, because we want to include a Y -> X converter.

Any idea how to solve that?

I don't see a problem here, so I'm hoping I'm correctly understanding the issue.

If we have:

repo_x contains the single datatype X repo_y contains the single datatype Y repo_x_to_y_converter contains a tool that converts datatype X to datatype Y (this repository also defines 2 dependency relationships, one to repo_x and another to repo_y) repo_y_to_x_cenverter contains a tool that converts datatype Y to datatype X (this repository also defines 2 dependency relationships, one to repo_x and another to repo_y)

Now if we want to install both the repo_x_to_y_converter and the repo_y_to_x_cenverter automatically whenever either one is installed, we have 2 options:

1) define a 3rd dependency relationshiop for repo_x_to_y_converter to depend on repo_y_to_x_cenverter and, similarly a 3rd dependency relationshiop for repo_y_to_x_cenverter on repo_x_to_y_converter. This does indeed create a circular repository dependency relationship, but the Tool Shed installation process will handle it correctly, installing all 4 repositories with proper dependency relationships created between them

Does that mean, circular dependencies will be no problem at all? Do you consider including the converters into the datatypes as best-practise? (These converters are implicit-galaxy-converters). I would have only two repositories with circular dependencies.

...

2) Instead of creating a circlular dependency relationship between repo_x_to_y_converter and repo_y_to_x_cenverter, create an additional suite_definition_x_y repository (of type "repository_suite_definition" that defines relationships to repo_x_to_y_converter and repo_y_to_x_cenverter, ultimately installing all 4 repositories, but without defining any circular dependency relationships.

repo_x_to_y_converter and repo_y_to_x_converter would have dependencies on datatype X and Y, so I do not see the need for a suite_definition ... or it is some collection like the emboss_datatypes ... My scenario is more that the converters are not tools, they are implicit converters and should _not_ be displayed in the tool panel. As far as I know they need to be defined inside the datatypes_conf.xml file. I think if circular dependencies are not a problem I will try to implement a proof of concept. EMBOSS is now splitted: https://github.com/bgruening/galaxytools/tree/master/datatypes/emboss_dataty... Thanks Greg! Bjoern

...

Either of the above 2 scenarios will correctly install the 4 repositories.

Let me know if I'm missing something here.

Thanks!

Greg

...
Excellent example!

...
How to handle versions of datatypes? Extra repositories for stockholm 1.0 and 1.1? If so ... the associated python file (sniffing, splitting ...) should be also versioned, or? What happend if I have two stockholm.py files in my system?

Potentially you might need/want to define those as two different Galaxy datatypes?

...
@Peter, can we create a striped-down, python only biopython egg? All parsers should be included, Bio.SeqIO should be sufficient I think.

Right now, yes in principle (and this is fine from the licence point of view), but in practise this is a fair chunk of work. However, we are looking at this - see https://github.com/biopython/biopython/issues/349

Peter

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

_______________________________________________ galaxy-iuc mailing list galaxy-iuc@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-iuc

Greg Von Kuster

1:19 a.m.

New subject: [galaxy-iuc] writing datatypes

Hi Björn, On Jul 22, 2014, at 6:01 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...

Hi Greg,

thanks for the clarification. Please see my comments below.

...
On Jul 20, 2014, at 3:22 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...
On Sun, Jul 20, 2014 at 6:23 PM, Björn Grüning <bjoern@gruenings.eu> wrote:

...
Hi,

single datatype definitions only work if you haven’t defined any converters. Let's assume I have a datatype X and want to ship a X -> Y converter (Y -> X is also possible), we will end up with a dependency loop, or? The X repository will depend on the Y repository, but Y is depending on X, because we want to include a Y -> X converter.

Any idea how to solve that?

I don't see a problem here, so I'm hoping I'm correctly understanding the issue.

If we have:

repo_x contains the single datatype X repo_y contains the single datatype Y repo_x_to_y_converter contains a tool that converts datatype X to datatype Y (this repository also defines 2 dependency relationships, one to repo_x and another to repo_y) repo_y_to_x_cenverter contains a tool that converts datatype Y to datatype X (this repository also defines 2 dependency relationships, one to repo_x and another to repo_y)

Now if we want to install both the repo_x_to_y_converter and the repo_y_to_x_cenverter automatically whenever either one is installed, we have 2 options:

1) define a 3rd dependency relationshiop for repo_x_to_y_converter to depend on repo_y_to_x_cenverter and, similarly a 3rd dependency relationshiop for repo_y_to_x_cenverter on repo_x_to_y_converter. This does indeed create a circular repository dependency relationship, but the Tool Shed installation process will handle it correctly, installing all 4 repositories with proper dependency relationships created between them

Does that mean, circular dependencies will be no problem at all?

Yes, the Tool Shed handles circular dependency definitions of any variety, so circular dependency definitions pose no problem.

...

Do you consider including the converters into the datatypes as best-practise? (These converters are implicit-galaxy-converters). I would have only two repositories with circular dependencies.

Yes, however, there are some current limitations in the framework detailed on this Trello card: https://trello.com/c/Ho3ra4b9/206-add-support-for-datatype-converters-and-di... Tag sets like the following that are defined in a datatypes_conf.xml file contained in a repository should be correctly loaded into the in-memory datatypes registry when the repository is instlled into Galaxy. However, it has been quite a while since I've worked in this area, so let me know if you encounter any issues. The current best practice is probaly that the converters themselved would each individually be in separate repositories (just like all Galaxy tools), but this can certainly be discussed if appropriate. Community thoughts are welcome here! <datatype extension="bam" type="galaxy.datatypes.binary:Bam" mimetype="application/octet-stream" display_in_upload="true"> <converter file="bam_to_bai.xml" target_datatype="bai"/> <converter file="bam_to_bigwig_converter.xml" target_datatype="bigwig"/> <display file="ucsc/bam.xml" /> <display file="ensembl/ensembl_bam.xml" /> <display file="igv/bam.xml" /> <display file="igb/bam.xml" /> </datatype>

...

...
2) Instead of creating a circlular dependency relationship between repo_x_to_y_converter and repo_y_to_x_cenverter, create an additional suite_definition_x_y repository (of type "repository_suite_definition" that defines relationships to repo_x_to_y_converter and repo_y_to_x_cenverter, ultimately installing all 4 repositories, but without defining any circular dependency relationships.

repo_x_to_y_converter and repo_y_to_x_converter would have dependencies on datatype X and Y, so I do not see the need for a suite_definition ... or it is some collection like the emboss_datatypes …

I agree.

...

My scenario is more that the converters are not tools, they are implicit converters and should _not_ be displayed in the tool panel. As far as I know they need to be defined inside the datatypes_conf.xml file.

Yes, they must be defined inside the datatypes_conf.xml file. However, converters are just special Galaxy Tools (they are "special" in the same way that Data Manager tools are special). They are loaded into the in-memory Galaxy tools registry, but not displayed in the tool panel.

...

I think if circular dependencies are not a problem I will try to implement a proof of concept. EMBOSS is now splitted:

Sounds goos - circular dependencies should pose no problems.

...

https://github.com/bgruening/galaxytools/tree/master/datatypes/emboss_dataty...

Thanks Greg! Bjoern

...
Either of the above 2 scenarios will correctly install the 4 repositories.

Let me know if I'm missing something here.

Thanks!

Greg

...
Excellent example!

...
How to handle versions of datatypes? Extra repositories for stockholm 1.0 and 1.1? If so ... the associated python file (sniffing, splitting ...) should be also versioned, or? What happend if I have two stockholm.py files in my system?

Potentially you might need/want to define those as two different Galaxy datatypes?

...
@Peter, can we create a striped-down, python only biopython egg? All parsers should be included, Bio.SeqIO should be sufficient I think.

Right now, yes in principle (and this is fine from the licence point of view), but in practise this is a fair chunk of work. However, we are looking at this - see https://github.com/biopython/biopython/issues/349

Peter

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

_______________________________________________ galaxy-iuc mailing list galaxy-iuc@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-iuc

Greg Von Kuster

1:27 a.m.

New subject: [galaxy-iuc] writing datatypes

Before we go too much further down this path with dataytpes, I'm wondering if some of us should put together a spec of some kind that allows us to all agree on the direction. For example, I'm wondering if datatyps should be versioned and have a name-spaced identifier much like the Tool Shed's guid identifier for tools. I haven't thought too much about whether this would pose backward compatibility issues or not. Discussion is welcomed on this. Greg Von Kuster On Jul 22, 2014, at 7:19 PM, Greg Von Kuster <greg@bx.psu.edu> wrote:

...

Hi Björn,

On Jul 22, 2014, at 6:01 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Hi Greg,

thanks for the clarification. Please see my comments below.

...
On Jul 20, 2014, at 3:22 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...
On Sun, Jul 20, 2014 at 6:23 PM, Björn Grüning <bjoern@gruenings.eu> wrote:

...
Hi,

single datatype definitions only work if you haven’t defined any converters. Let's assume I have a datatype X and want to ship a X -> Y converter (Y -> X is also possible), we will end up with a dependency loop, or? The X repository will depend on the Y repository, but Y is depending on X, because we want to include a Y -> X converter.

Any idea how to solve that?

I don't see a problem here, so I'm hoping I'm correctly understanding the issue.

If we have:

repo_x contains the single datatype X repo_y contains the single datatype Y repo_x_to_y_converter contains a tool that converts datatype X to datatype Y (this repository also defines 2 dependency relationships, one to repo_x and another to repo_y) repo_y_to_x_cenverter contains a tool that converts datatype Y to datatype X (this repository also defines 2 dependency relationships, one to repo_x and another to repo_y)

Now if we want to install both the repo_x_to_y_converter and the repo_y_to_x_cenverter automatically whenever either one is installed, we have 2 options:

1) define a 3rd dependency relationshiop for repo_x_to_y_converter to depend on repo_y_to_x_cenverter and, similarly a 3rd dependency relationshiop for repo_y_to_x_cenverter on repo_x_to_y_converter. This does indeed create a circular repository dependency relationship, but the Tool Shed installation process will handle it correctly, installing all 4 repositories with proper dependency relationships created between them

Does that mean, circular dependencies will be no problem at all?

Yes, the Tool Shed handles circular dependency definitions of any variety, so circular dependency definitions pose no problem.

...
Do you consider including the converters into the datatypes as best-practise? (These converters are implicit-galaxy-converters). I would have only two repositories with circular dependencies.

Yes, however, there are some current limitations in the framework detailed on this Trello card: https://trello.com/c/Ho3ra4b9/206-add-support-for-datatype-converters-and-di...

Tag sets like the following that are defined in a datatypes_conf.xml file contained in a repository should be correctly loaded into the in-memory datatypes registry when the repository is instlled into Galaxy. However, it has been quite a while since I've worked in this area, so let me know if you encounter any issues. The current best practice is probaly that the converters themselved would each individually be in separate repositories (just like all Galaxy tools), but this can certainly be discussed if appropriate. Community thoughts are welcome here!

<datatype extension="bam" type="galaxy.datatypes.binary:Bam" mimetype="application/octet-stream" display_in_upload="true"> <converter file="bam_to_bai.xml" target_datatype="bai"/> <converter file="bam_to_bigwig_converter.xml" target_datatype="bigwig"/> <display file="ucsc/bam.xml" /> <display file="ensembl/ensembl_bam.xml" /> <display file="igv/bam.xml" /> <display file="igb/bam.xml" /> </datatype>

...
...
2) Instead of creating a circlular dependency relationship between repo_x_to_y_converter and repo_y_to_x_cenverter, create an additional suite_definition_x_y repository (of type "repository_suite_definition" that defines relationships to repo_x_to_y_converter and repo_y_to_x_cenverter, ultimately installing all 4 repositories, but without defining any circular dependency relationships.

repo_x_to_y_converter and repo_y_to_x_converter would have dependencies on datatype X and Y, so I do not see the need for a suite_definition ... or it is some collection like the emboss_datatypes …

I agree.

...
My scenario is more that the converters are not tools, they are implicit converters and should _not_ be displayed in the tool panel. As far as I know they need to be defined inside the datatypes_conf.xml file.

Yes, they must be defined inside the datatypes_conf.xml file. However, converters are just special Galaxy Tools (they are "special" in the same way that Data Manager tools are special). They are loaded into the in-memory Galaxy tools registry, but not displayed in the tool panel.

...
I think if circular dependencies are not a problem I will try to implement a proof of concept. EMBOSS is now splitted:

Sounds goos - circular dependencies should pose no problems.

...
https://github.com/bgruening/galaxytools/tree/master/datatypes/emboss_dataty...

Thanks Greg! Bjoern

...
Either of the above 2 scenarios will correctly install the 4 repositories.

Let me know if I'm missing something here.

Thanks!

Greg

...
Excellent example!

...
How to handle versions of datatypes? Extra repositories for stockholm 1.0 and 1.1? If so ... the associated python file (sniffing, splitting ...) should be also versioned, or? What happend if I have two stockholm.py files in my system?

Potentially you might need/want to define those as two different Galaxy datatypes?

...
@Peter, can we create a striped-down, python only biopython egg? All parsers should be included, Bio.SeqIO should be sufficient I think.

Right now, yes in principle (and this is fine from the licence point of view), but in practise this is a fair chunk of work. However, we are looking at this - see https://github.com/biopython/biopython/issues/349

Peter

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

_______________________________________________ galaxy-iuc mailing list galaxy-iuc@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-iuc

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

John Chilton

18 Jul 18 Jul

5:59 p.m.

Does the current implementation really handle datatypes in reproducible manner - if I have a repo which in revision 1 defines foo1 as a text subtype, foo2 as a tabular type and foo3 as a new type in foo.py and then in revision 2 foo1 is defined as a binary subtype , foo2 and foo3 disappear and foo4 is a new type in foo.py (which no longer defines foo3) how could you possibly resolve that in a "reproducible" manner. Some of your tools are going to expect foo1 to be one thing - others something else. You are only going to place 1 copy of foo.py on the PYTHONPATH right (or at least python will only load one)? Is it going to define foo3 or foo4? In addition to lacking reproducibility within one instance - if you are somehow trying to preserve all the datatypes a repository has ever defined I feel like after a long stream of such updates - the behavior of the datatypes is going to vary from one installation to another that installed different repository versions. Hence - reproducibility across instances is subtly broken as well? None of this is a solution of course - this problem strikes me as being very difficult. That said - I think correctness and reproduciblity across instances is more important than reproducibility within the same instance over time - so for that reason I think there only being one installable revision of datatypes might be a big step forward relative to the status quo. Intuitively - if we are not namespacing/versioning datatypes - there should only be one definition and it should be the most recently installed one right? It would also resolve this https://trello.com/c/oTq2Kewd problem - where unsniffable binary datatypes are treated as sniffiable if there was ever an installed version that was some sniff-able datatype. -John On Jul 17, 2014 12:35 PM, "Greg Von Kuster" <greg@bx.psu.edu> wrote:

...

This would be easy to implement, but could adversely affect reproducibility. If a repository containing datatypes always had only a single installable revision (i.e., the chagelog tip), then any datatypes defined in an early changeset revision that are removed in a later changeset revision would no longer be available.

Greg

On Jul 17, 2014, at 1:30 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...
On Thu, Jul 17, 2014 at 6:10 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
... but the problem will stay the same ... one [datatype definition]

repository

...
can have multiple versions ...

I like your idea that like tool dependency definitions, this should be a special repository type on the ToolShed:

Earlier, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Imho datatypes should be handled like "Tool dependency definitions". There should be only one "installable revsion".

This is something Greg will have to comment on - there may be ramifications I'm not seeing.

Peter

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Greg Von Kuster

21 Jul 21 Jul

3:33 p.m.

Hi John, The general question, I think, is whether reproducibility is important. If it is, then we should not introduce new behavior that adversely impacts it. There are undoubtedly scenarios where reproducibility is not currently absolutely guaranteed, but those area of weakness should be corrected (as time and resources allow) when they are discovered if reproducibility is one of the desired features. Please see my inline comments too. On Jul 18, 2014, at 11:59 AM, John Chilton <jmchilton@gmail.com> wrote:

...

Does the current implementation really handle datatypes in reproducible manner - if I have a repo which in revision 1 defines foo1 as a text subtype, foo2 as a tabular type and foo3 as a new type in foo.py and then in revision 2 foo1 is defined as a binary subtype , foo2 and foo3 disappear and foo4 is a new type in foo.py (which no longer defines foo3) how could you possibly resolve that in a "reproducible" manner.

So you have: repo_a revision 1: foo1 datatype as text subtype foo2 datatype as tabular foo3 new datatype in foo.py repo_a revision 2: foo1 datatype as binary subtype foo4 new type in foo.py I would say that this is an example of a "bad practice" on the part of the repository owner, but, of course, this scenario can certainly occur. In this case, the current implementation creates 2 separate installable revisions of repo_a which are loaded into the datatype's registry in a specific order. If repo_a revision 1 was installed first, then it will always be loaded first, and the foo1 and foo4 datatypes contained in repo_a revision 2 will not be loaded because they are currently considered conflicting datatypes. So currently, reproducibility is ensured, but the versions of foo1 and foo4 in revision 2 cannot be used. This may not be ideal, but in order to allow both versions to be used, more than the datatype extensions will be needed in order to defferentiate datatypes (i.e., some named-spaced identifier similar to the Tool Shed's guid for tools).

...

Some of your tools are going to expect foo1 to be one thing - others something else. You are only going to place 1 copy of foo.py on the PYTHONPATH right (or at least python will only load one)? Is it going to define foo3 or foo4? In addition to lacking reproducibility within one instance - if you are somehow trying to preserve all the datatypes a repository has ever defined I feel like after a long stream of such updates - the behavior of the datatypes is going to vary from one installation to another that installed different repository versions. Hence - reproducibility across instances is subtly broken as well?

None of this is a solution of course - this problem strikes me as being very difficult.

That said - I think correctness and reproduciblity across instances is more important than reproducibility within the same instance over time - so for that reason I think there only being one installable revision of datatypes might be a big step forward relative to the status quo. Intuitively - if we are not namespacing/versioning datatypes - there should only be one definition and it should be the most recently installed one right?

It would also resolve this https://trello.com/c/oTq2Kewd problem - where unsniffable binary datatypes are treated as sniffiable if there was ever an installed version that was some sniff-able datatype.

-John

On Jul 17, 2014 12:35 PM, "Greg Von Kuster" <greg@bx.psu.edu> wrote: This would be easy to implement, but could adversely affect reproducibility. If a repository containing datatypes always had only a single installable revision (i.e., the chagelog tip), then any datatypes defined in an early changeset revision that are removed in a later changeset revision would no longer be available.

Greg

On Jul 17, 2014, at 1:30 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...
On Thu, Jul 17, 2014 at 6:10 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
... but the problem will stay the same ... one [datatype definition] repository can have multiple versions ...

I like your idea that like tool dependency definitions, this should be a special repository type on the ToolShed:

Earlier, Björn Grüning <bjoern.gruening@gmail.com> wrote:

...
Imho datatypes should be handled like "Tool dependency definitions". There should be only one "installable revsion".

This is something Greg will have to comment on - there may be ramifications I'm not seeing.

Peter

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Eric Rasche

17 Jul 17 Jul

5:31 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 For those reading this thread from the future, there's a secret to adding completely new datatypes locally (and not through a toolshed). You have to manually edit lib/galaxy/datatypes/registry.py and import the module you've written at the top of the file. For instance, if you add a new "gbk.py" datatype, you'll need to add "import gbk" to the top of registry.py. This will cause your errors to go away and your datatype to be loaded on startup. Thanks to John Chilton for answering this on IRC. Cheers, Eric On 07/16/2014 09:02 AM, Eric Rasche wrote:

...

Forgive me, I'm not 100% clear on the custom plugin system used by galaxy, but if I "subclass" from the text data type, will sniffers I implement override text's and function? The lack of being able to add an entry to the sniffer section (unlike with the tabular example) led me to believe my genbank datatype wouldn't be sniffed.

Additionally, I'd still like to be able to add completely new datatypes, do you know of any working examples of this? As mentioned in my original post, duplicating an existing datatype and changing names on it surprisingly doesn't work.

I'd be lovely to have the emboss datatypes split out.

Cheers, Eric

On July 16, 2014 8:34:55 AM CDT, Peter Cock <p.j.a.cock@googlemail.com> wrote:

Indeed - ideally (once working) we can upload under the IUC ToolShed as a community maintained resource rather than under a personal account which becomes a single point of failure (the bus factor).

We (the ICU) have previously discussed doing this so that the EMBOSS datatypes could become more of a meta-entry depending on other smaller specific datatype defining ToolShed repositories. But it hasn't reached the top of my personal TODO list yet ;)

Peter

On Wed, Jul 16, 2014 at 1:47 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:

Hi Eric,

please have a look at:

https://github.com/bgruening/galaxytools/blob/master/datatypes/msa_datatypes...

You need somthing like: <datatype extension="genbank" type="galaxy.datatypes.data:Text" subclass="True" />

Lets try to split the EMBOSS datatypes a little bit into small chunks. E.g. sequences_datatypes, msa_datatypes ... and so on ...

Cheers, Bjoern

Am 14.07.2014 20:31, schrieb Eric Rasche:

I'm trying to add a new datatype to my galaxy instance for genbank files, however I'm running into various issues. I've followed the tutorial (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes)

however that example subclasses tabular, and I'd like to subclass Text as they're plain text files, and I'd like to be able to define a sniffer for them (not possible if your type=galaxy.datatypes.data:Text)

I figured the call ought to be something like

<datatype extension="gb" type="galaxy.datatypes.data:Genbank" subclass="True" />

however, everything I try fails with

Error importing datatype module galaxy.datatypes.data: 'module' object has no attribute 'Genbank'

To avoid this particular issue, I tried writing a separate datatype just for genbank files (type="galaxy.datatypes.genbank:Genbank"), however that fails with the same error:

galaxy.datatypes.registry ERROR 2014-07-14 13:23:23,100 Error importing datatype module galaxy.datatypes.genbank: 'module' object has no attribute 'genbank' Traceback (most recent call last): File "/home/hxr/work/galaxy-central/lib/galaxy/datatypes/registry.py <http://registry.py>", line 206, in load_datatypes module = getattr( module, mod ) AttributeError: 'module' object has no attribute 'genbank'

Here's my lib/galaxy/datatypes/genbank.py <http://genbank.py> looks like:

import pkg_resources pkg_resources.require( "bx-python" ) import logging from galaxy.datatypes import data log = logging.getLogger(__name__)

class Genbank( data.Text ): file_ext = "gb"

def sniff( self, filename ): header = open(filename).read(5) return header == 'LOCUS'

To debug this, I've tried copying the tabular data type completely, removed all the classes other than Tabular, and renamed it "Genbank", however this fails too with the same error.

Can anyone offer some insight?

Cheers, Eric ------------------------------------------------------------------------

Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

------------------------------------------------------------------------

Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

-- Sent from my Android device with K-9 Mail. Please excuse my brevity. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTx+w/AAoJEMqDXdrsMcpVXqgP/0IaiufJwoP5gKS1suQ8fLJz U/V/9ysgZsW0NUfZR7sCuPP/h6x+HlhRM41IweoYwqDI5qJHClrDIHahYNM4rJ76 OyP1qgpQdlZE8R/kveKRUIEh1YpzAHsIZlFUAnuuFrEeJN2QGrmffsuDEQ/E5AoS tvLxcFrJ1gY45KhfhUr9OLgsTX1pt30jlgswzlG7I6ii3hmWgex/EKh+Xf0CRJHD fIS0qc3RNzrxvUmfFtXlFLn6WM/ZbJyLMB4qE8B2S2hLvIQa14KlsziCs9n13GtW qr0o+6E05LpqbKYCFvINEbasyxjVpFKoccRYWsZNu8UP3taiyw14COTgqvlnyXJQ QlM7a8NlmG2wnOpuwY2uEnqbAKeaUbtawz0jIlRGbVs4x7TkC/O8UrL8VTcqOt+0 s5Ix2Rf5qevt5jKIvLxHxjwXvP3mP8gZSWJjMG31Kq3vQjErNn/bczb3WKgfVCW7 h39bjt0nALam5bLcHcCvzS39/ea0M7NlvJqUA1b/a/ViqIxru3IPL927fWsvACe/ 1Cfep6gFc/tmJHZM8hZEtgiOnh8pqkGOiuEevM4NAaBLWsrT1a/oijq9xQEyoG2+ vEyFmzGF1DmqELRdh97AD7MWqlZSB3V+TfsuEboF+67sB1p0MLv5HQthtP63k9eH 0xstC2V6X0LHoTMUDiwa =ClA/ -----END PGP SIGNATURE-----

Peter Cock

6:46 p.m.

On Thu, Jul 17, 2014 at 4:31 PM, Eric Rasche <rasche.eric@yandex.ru> wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

For those reading this thread from the future, there's a secret to adding completely new datatypes locally (and not through a toolshed).

You have to manually edit lib/galaxy/datatypes/registry.py and import the module you've written at the top of the file.

For instance, if you add a new "gbk.py" datatype, you'll need to add "import gbk" to the top of registry.py. This will cause your errors to go away and your datatype to be loaded on startup.

Thanks to John Chilton for answering this on IRC.

Cheers, Eric

Indeed - sorry I hadn't spotted that complication. The README files for these datatype extensions may help: https://github.com/peterjc/galaxy_blast/tree/master/datatypes/blast_datatype... https://github.com/peterjc/pico_galaxy/tree/master/datatypes/mira_datatypes I have to do this manually with some sed magic in my TravisCI automated set setup, see: http://blastedbio.blogspot.co.uk/2013/09/using-travis-ci-for-testing-galaxy-... Peter

Eric Rasche

6:55 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Not a problem Peter, it's a somewhat subtle bug to have, and there isn't a lot of documentation on the wiki about writing new datatypes (though I plan to fix that soon). That particular error message could stand to be a bit more explicit. (e.g., "Did you forget to add import mylib to registry.py?"). Also, thanks for sharing the blog post. Since we develop all of our tools internally, I may adapt and publish your post with similar instructions for jenkins, if that's all right by you. Cheers, Eric On 07/17/2014 11:46 AM, Peter Cock wrote:

...

On Thu, Jul 17, 2014 at 4:31 PM, Eric Rasche <rasche.eric@yandex.ru> wrote:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

For those reading this thread from the future, there's a secret to adding completely new datatypes locally (and not through a toolshed).

You have to manually edit lib/galaxy/datatypes/registry.py and import the module you've written at the top of the file.

For instance, if you add a new "gbk.py" datatype, you'll need to add "import gbk" to the top of registry.py. This will cause your errors to go away and your datatype to be loaded on startup.

Thanks to John Chilton for answering this on IRC.

Cheers, Eric

Indeed - sorry I hadn't spotted that complication.

The README files for these datatype extensions may help:

https://github.com/peterjc/galaxy_blast/tree/master/datatypes/blast_datatype... https://github.com/peterjc/pico_galaxy/tree/master/datatypes/mira_datatypes

I have to do this manually with some sed magic in my TravisCI automated set setup, see:

http://blastedbio.blogspot.co.uk/2013/09/using-travis-ci-for-testing-galaxy-...

Peter

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJTx//oAAoJEMqDXdrsMcpVY40P/2KI2RniuGgsl7w0Mt3by4wP XIWAsRRYUL/I4pTqEgtg3/aMn/9J2PFfPTvJMJbwCboT7Bn/4q0vc4qW7MDPSsjR 1V1XZ/5dEi0Q/gjXQYZmib2uSBgnRR58XR8/ae2UUKDINJv2BsToIB7Z60bB2XAI a/b7qLXgq37NOFaZmBsqCse1yf7D9qD20Gf3c2uNYRPdARbkTVNMfjNoCzbNkMiJ QyPt0c7ZetrKUseEgKoBa4EtO/y8uU7EHdYo2WxtmymZFdeIzTit9XKk/l6V0p2G pqwcc504r0AsKA46/5BY5g9MpboEk36CRG0u+CG3vWv958MKxKMblKYE7qexqq9p 6UrsdxvHohX4IlTMU4GEwCMvks+jn2JwMqYGUOpk8yQLkTALxRUfJcheN3RtMvfF jRT2xzUm0s3dwKCHX5v7dePYIYLRvpig8CwRtL2FQZTntxJh2FAvwnL6ViUi/jGL +FYjfGFDMRvqqY81nAqUh7dfjEOVf8J5lTAL2YTzZ8y8sLDtZNaeCdNj+4IUOYJT 5QEDpKH/TR4W8MnlmE5gLFZC0Yf0v951pikjMR+rI2mYVf1uYT1UVeWpPT2JZXdw gbNOt/Gu9gcK2GTAmd223bCy3zPZGkVW3JVJlTo1wiyx7Bx3umQGLQEDu3aGpOEm b2DJ01ovMrEsr9X83v9i =TZls -----END PGP SIGNATURE-----

Peter Cock

6:59 p.m.

On Thu, Jul 17, 2014 at 5:55 PM, Eric Rasche <rasche.eric@yandex.ru> wrote:

...

Not a problem Peter, it's a somewhat subtle bug to have, and there isn't a lot of documentation on the wiki about writing new datatypes (though I plan to fix that soon).

That particular error message could stand to be a bit more explicit. (e.g., "Did you forget to add import mylib to registry.py?").

Also, thanks for sharing the blog post. Since we develop all of our tools internally, I may adapt and publish your post with similar instructions for jenkins, if that's all right by you.

Cheers, Eric

Please do :) Peter P.S. I know Saket is using this approach too now: https://github.com/saketkc/galaxy_tools

Saket Choudhary

8:59 p.m.

On 17 July 2014 22:29, Peter Cock <p.j.a.cock@googlemail.com> wrote:

...

On Thu, Jul 17, 2014 at 5:55 PM, Eric Rasche <rasche.eric@yandex.ru> wrote:

...
Not a problem Peter, it's a somewhat subtle bug to have, and there isn't a lot of documentation on the wiki about writing new datatypes (though I plan to fix that soon).

That particular error message could stand to be a bit more explicit. (e.g., "Did you forget to add import mylib to registry.py?").

Also, thanks for sharing the blog post. Since we develop all of our tools internally, I may adapt and publish your post with similar instructions for jenkins, if that's all right by you.

Cheers, Eric

Please do :)

Peter

P.S. I know Saket is using this approach too now: https://github.com/saketkc/galaxy_tools

Indeed, Travis has been really helpful. Thanks Peter for that blogpost :-)

...

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

4000

Age (days ago)

4008

Last active (days ago)

List overview

Download

33 comments

7 participants

participants (7)

Björn Grüning
Björn Grüning
Eric Rasche
Greg Von Kuster
John Chilton
Peter Cock
Saket Choudhary