Hi Björn, On Jul 22, 2014, at 6:01 PM, Björn Grüning <bjoern.gruening@gmail.com> wrote:
Hi Greg,
thanks for the clarification. Please see my comments below.
On Jul 20, 2014, at 3:22 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Sun, Jul 20, 2014 at 6:23 PM, Björn Grüning <bjoern@gruenings.eu> wrote:
Hi,
single datatype definitions only work if you haven’t defined any converters. Let's assume I have a datatype X and want to ship a X -> Y converter (Y -> X is also possible), we will end up with a dependency loop, or? The X repository will depend on the Y repository, but Y is depending on X, because we want to include a Y -> X converter.
Any idea how to solve that?
I don't see a problem here, so I'm hoping I'm correctly understanding the issue.
If we have:
repo_x contains the single datatype X repo_y contains the single datatype Y repo_x_to_y_converter contains a tool that converts datatype X to datatype Y (this repository also defines 2 dependency relationships, one to repo_x and another to repo_y) repo_y_to_x_cenverter contains a tool that converts datatype Y to datatype X (this repository also defines 2 dependency relationships, one to repo_x and another to repo_y)
Now if we want to install both the repo_x_to_y_converter and the repo_y_to_x_cenverter automatically whenever either one is installed, we have 2 options:
1) define a 3rd dependency relationshiop for repo_x_to_y_converter to depend on repo_y_to_x_cenverter and, similarly a 3rd dependency relationshiop for repo_y_to_x_cenverter on repo_x_to_y_converter. This does indeed create a circular repository dependency relationship, but the Tool Shed installation process will handle it correctly, installing all 4 repositories with proper dependency relationships created between them
Does that mean, circular dependencies will be no problem at all?
Yes, the Tool Shed handles circular dependency definitions of any variety, so circular dependency definitions pose no problem.
Do you consider including the converters into the datatypes as best-practise? (These converters are implicit-galaxy-converters). I would have only two repositories with circular dependencies.
Yes, however, there are some current limitations in the framework detailed on this Trello card: https://trello.com/c/Ho3ra4b9/206-add-support-for-datatype-converters-and-di... Tag sets like the following that are defined in a datatypes_conf.xml file contained in a repository should be correctly loaded into the in-memory datatypes registry when the repository is instlled into Galaxy. However, it has been quite a while since I've worked in this area, so let me know if you encounter any issues. The current best practice is probaly that the converters themselved would each individually be in separate repositories (just like all Galaxy tools), but this can certainly be discussed if appropriate. Community thoughts are welcome here! <datatype extension="bam" type="galaxy.datatypes.binary:Bam" mimetype="application/octet-stream" display_in_upload="true"> <converter file="bam_to_bai.xml" target_datatype="bai"/> <converter file="bam_to_bigwig_converter.xml" target_datatype="bigwig"/> <display file="ucsc/bam.xml" /> <display file="ensembl/ensembl_bam.xml" /> <display file="igv/bam.xml" /> <display file="igb/bam.xml" /> </datatype>
2) Instead of creating a circlular dependency relationship between repo_x_to_y_converter and repo_y_to_x_cenverter, create an additional suite_definition_x_y repository (of type "repository_suite_definition" that defines relationships to repo_x_to_y_converter and repo_y_to_x_cenverter, ultimately installing all 4 repositories, but without defining any circular dependency relationships.
repo_x_to_y_converter and repo_y_to_x_converter would have dependencies on datatype X and Y, so I do not see the need for a suite_definition ... or it is some collection like the emboss_datatypes …
I agree.
My scenario is more that the converters are not tools, they are implicit converters and should _not_ be displayed in the tool panel. As far as I know they need to be defined inside the datatypes_conf.xml file.
Yes, they must be defined inside the datatypes_conf.xml file. However, converters are just special Galaxy Tools (they are "special" in the same way that Data Manager tools are special). They are loaded into the in-memory Galaxy tools registry, but not displayed in the tool panel.
I think if circular dependencies are not a problem I will try to implement a proof of concept. EMBOSS is now splitted:
Sounds goos - circular dependencies should pose no problems.
https://github.com/bgruening/galaxytools/tree/master/datatypes/emboss_dataty...
Thanks Greg! Bjoern
Either of the above 2 scenarios will correctly install the 4 repositories.
Let me know if I'm missing something here.
Thanks!
Greg
Excellent example!
How to handle versions of datatypes? Extra repositories for stockholm 1.0 and 1.1? If so ... the associated python file (sniffing, splitting ...) should be also versioned, or? What happend if I have two stockholm.py files in my system?
Potentially you might need/want to define those as two different Galaxy datatypes?
@Peter, can we create a striped-down, python only biopython egg? All parsers should be included, Bio.SeqIO should be sufficient I think.
Right now, yes in principle (and this is fine from the licence point of view), but in practise this is a fair chunk of work. However, we are looking at this - see https://github.com/biopython/biopython/issues/349
Peter
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
_______________________________________________ galaxy-iuc mailing list galaxy-iuc@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-iuc