Hi Björn and Nicola,

You can expect NLP tools on the ToolShed sometime in the (near?) future.

Our project [1] shares funding sources with Galaxy (NSF SI2) and the NSF loves this kind of cross-domain collaboration.  I also know of at least one other NLP project [2] that uses Galaxy that would likely share their tools as well.

I am going to break the rest of my reply into separate threads so hopefully they will get a little more visibility.

Cheers,
Keith

REFERENCES

1. http://www.nsf.gov/awardsearch/showAward?AWD_ID=1147944
2. http://galaxy.alveo.edu.au

On Apr 24, 2015, at 9:30 AM, Bjoern Gruening <bjoern.gruening@gmail.com> wrote:

However, does the bioinformatics community really want a bunch of NLP tools in their tool shed?

Yes, Yes, Yes!
By the "toolbox" do you mean adding my converters to the tool_conf.xml file so they are available on the Tools menu?  I have done that and I can add the converters to a workflow manually. I was just hoping the workflow editor could detect when it could perform the conversion and insert the converters as needed; it seems this is not possible.

Maybe someone else can jump in here, I do not see why this shouldn't be possible? Maybe this is just an UI issue?!

I am going to break this into its own thread so hopefully it gets more visibility.


Setting the metadata in the tool wrapper is fine, and after grepping through some of the other wrappers I think I need something like:

  <!-- Output from a tokenizer -->
  <outputs>
    <data name="output" format="xml" label="Output">
        <actions>
            <action type="metadata" name="tokens">True</action>
        </actions>
    </data>
  </outputs>

  <!-- Input to a part of speech tagger -->
  <inputs>
    <param name="input" type="data" format="xml">
        <validator type="expression" message="Please run a tokenizer first.">metadata.tokens is not None</validator>
    </param>
  </inputs>

That is, the input validator simply checks if some value has been set in the metadata, and the output sets a value in the metadata.  The above does not work, but at least Galaxy stopped complaining about the tool XML with this.  However, the documentation for <option/> <filter/> and <action/> does not match up with what existing wrappers (in the dev branch) are doing so I am having problems with the exact syntax.

Can you try:         <action type="metadata" name="tokens" default="True"/>

You can also filter your inputs in the speech tagger:

 <options options_filter_attribute="metadata.tokens" > <filter type="add_value" value="True" /> </options>



Do you have pointers to any documentation on data collections?  My searches haven't turned up much but tantalizing references [1],
and my experiments trying to return a data collection from a tool have been unsuccessful.

https://wiki.galaxyproject.org/Histories?highlight=%28collection%29#Dataset_Collections

https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax ->
data_collection

And have a look at:
https://github.com/galaxyproject/galaxy/tree/dev/test/functional/tools

Success!  I was running the code from master, so I suspect that was part of my problem. 

Nice!

However, my browser is still complaining about long running scripts.

Can you put this in a different thread?

A script on this page may be busy, or it may have stopped responding. You can stop the script now, open the script in the debugger, or let the script continue.


I accidentally left visible="true" when creating the dataset collection and ended up with +1500 items in my history; the above message kept popping up while the workflow was running (at least until I selected "Don't show this again").  Deleting +1500 datasets from the history is also very slow, but that is a different issue. On the bright side, at least I had +1500 items in the history to delete.

+1500 different elements is a lot for a history, for usability we should try to use collections here. No one wants to deal with such an mount of history objects :)


I have also been trying John Chiltons blend4j and managed to populate a data library, and this is almost what I want,
but I would like a tool that can be included in a workflow as the data from the library may not necessarily be the first step.   
I have no problem calling the Galaxy API from my tools, except that between the bioinformatics lingo and Python (I'm a Java programmer) it's slow going.

If possible at all you should avoid this, but as last resort probably an
option.

Out of curiosity, what exactly should I avoid; making calls to the Galaxy REST/API from inside a tool, using blend4j, or populating a data library from inside a tool?  I can see myself doing all three in the near future.

* making calls to the Galaxy REST/API from inside a tool

Think big! Your tools will run in large cluster environments, one job schedulers and Cloud-Infrstructures. You don't know if you job is allowed to connect to your Galaxy instance - security wise. Also you need to authenticate, more issues ....

Ciao,
Bjoern

Cheers,
Keith


Ciao,
Bjoern

Cheers,
Keith

REFERENCES

1. https://wiki.galaxyproject.org/Learn/API#Collections
2. https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_of_Output_datasets_cannot_be_determined_until_tool_run


Oh yes this is supported out of the box!
See here for a small documentation:
https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox#supported-filetypes

Here is a example of how you can write your own datatypes:

https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/datatypes

I feel like I must be missing the obvious.  Here is the relevant section of my datatypes_conf.xml (you can see the full file at https://github.com/oanc/Galaxy/blob/master/config/datatypes_conf.xml)

<datatype extension="lif" type="galaxy.datatypes.text:Json" display_in_upload="True">
<converter file="convert.json2gate_2.0.0.xml" target_datatype="gate"/>
</datatype>
<datatype extension="gate" type="galaxy.datatypes.xml:GenericXml" mimetype="application/xml" display_in_upload="true">
<converter file="convert.gate2json_2.0.0.xml" target_datatype="lif"/>
</datatype>

Is there anything I need to do beyond defining the datatypes for implicit conversions to take place?

I guess you need to place your converters under
https://github.com/oanc/Galaxy/tree/master/lib/galaxy/datatypes/converters/

And get rid of 'convert.' in your datatypes_conf.xml at least if you are
not using the TS.

Hope this helps you a little bit more,
Bjoern

Thanks,
Keith Suderman



4. OAuth 2.0 / OpenID Connect:

I need to be able to fetch documents from data providers that require an OAuth 2.0 access token. Currently, I use a separate service to go
through the OAuth authentication/authorization process and then have the user copy/paste their access token into a text field in Galaxy.   
Is there a way to perform the OAuth authentication dance required by the remote service inside Galaxy itself?   

I don't think so, but maybe someone else has an idea here.

I’ve looked at the Trello site for Galaxy and see that both OAuth 2.0 and OpenID Connect are on the radar, hopefully this use case is being considered as well.

I’m sure to have more questions after working through some visualization examples, but this should keep me busy for now.

Hope you are busy now :)
Cheers and keep us up to date!
Bjoern

Sincerely,
Keith Suderman

REFERENCES

1. https://wiki.galaxyproject.org/Admin/Tools/AddingTools

------------------------------
Research Associate
Department of Computer Science
Vassar College
Poughkeepsie, NY


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/



------------------------------
Research Associate
Department of Computer Science
Vassar College
Poughkeepsie, NY





------------------------------
Research Associate
Department of Computer Science
Vassar College
Poughkeepsie, NY




------------------------------
Research Associate
Department of Computer Science
Vassar College
Poughkeepsie, NY




------------------------------
Research Associate
Department of Computer Science
Vassar College
Poughkeepsie, NY