As far as I know, it's best to write tool wrappers as if they were meant to be called outside of Galaxy. In other words, it would not be best to try and get Galaxy dataset ids within the tool code. If we zoom out on the problem and take a higher view, is the JSON file primary used to launch the visualization or does it have another use as well? Can/is the JSON passed to IGV directly in javascript? or does it need to be read from the file separately by IGV and independent from the visualization mako and its associated javascript? If the JSON data is only for the visualization, doesn't need to be 100% correct in the file, and can be passed within javascript, you can alter the JSON data directly in the visualization mako/js by decorating with the bam ids before passing it to IGV. In that case, the previous code (or a cleaner version of it) will begin to get you there. Unfortunately, the previous code should work in the visualization mako only. (With the correction: from galaxy import model instead of import model The trans is an object describing the current WebTransaction (request/response). It allows access to a sqlalchemy (sa) database session: trans.sa_session On Fri, Aug 21, 2015 at 11:23 AM, Asma Riyaz <asmariyaz23@gmail.com> wrote:
Hi Carl,
The visualization comes into play after a lab implemented tool in galaxy is ran; I am not using galaxy's workflow. Is tracing back ids still possible in this case with the test case you wrote earlier?
Also I couldn't figure out what "trans" refers to in your previous message.
Thank you, Asma
On Thu, Aug 20, 2015 at 3:03 PM, Asma Riyaz <asmariyaz23@gmail.com> wrote:
---------- Forwarded message ---------- From: Carl Eberhard <carlfeberhard@gmail.com> Date: Thu, Aug 20, 2015 at 2:46 PM Subject: Re: [galaxy-dev] Get dataset/API ids for a dataset To: Asma Riyaz <asmariyaz23@gmail.com> Cc: galaxy-dev <galaxy-dev@lists.galaxyproject.org>
If I understand correctly, this begins to sound less like something the visualization level can do and more something that needs to be handled at your tool level.
Let me repeat back what I understand to be the process: 1. Your pipeline is activated by the user and some initial step in the pipeline creates the JSON file that will configure your visualization 2. Some indeterminate number of bam files are created 3. The pipeline finishes and at this point the encoded ids of all the bam files created by the pipeline should be used in urls added to the JSON file from step 1 4. The user then clicks on one of the outputs (the JSON file? Yes ) from the pipeline to launch the visualization and the JSON file is read
Do I have that right? When you say pipeline does that mean a Galaxy workflow? ---> Yes, this is exactly what I want to do. No, not a Galaxy workflow, but a pipeline written in the lab which is then converted to a galaxy tool. Is 'hda' available at the tool level as well? I have read through the Bioblend API but couldn't figure out a way to query for only those datasets that being worked on by galaxy in current history.
If so, you can access the workflow using the ORM by tracing up from the dataset that invoked the visualization and then back down to the bam files that were created in the workflow steps:
<% # get the bam datasets created by the workflow that created 'hda' # where 'hda' is the dataset the visualization launched from import model w = trans.sa_session.query( model.WorkflowInvocationStep ) .filter( WorkflowInvocationStep.job == hda.creating_job ).one().workflow_invocation ids = [ d.dataset.id for d in s.job.output_datasets if d.dataset.ext == 'bam' ] for s in w.steps if s.job ] urls = [ ... ] %>
(The above is really horrible code, but sketches one way you could get the ids from the visualization mako)
If it's not a workflow and a pipeline being run from within a Galaxy tool wrapper, then the tool wrapper code should be writing the ids to the JSON file. Is that the case instead? --> Yes, tool wrapper code is where the IDs need to inserted. Will try using "model" and see where I get with it.
-Thank you, Asma
On Thu, Aug 20, 2015 at 1:28 PM, Asma Riyaz <asmariyaz23@gmail.com> wrote:
Hi Carl,
Thank you for your reply. This definitely helps me get started, my question being:
trans.history will get all the dataset ids in users history regardless of which run the datasets are associated with. Hence if the user has multiple bams loaded in history there will be no way of distinguishing them.
Here is a rough idea of what I am envisioning my pipeline to do:
Galaxy pipeline runs -> while it is running, dataset ids that are generated should be retrieved for each output (in my case bams and JSON file) -> when main pipeline finishes, ids are updated within the JSON file -> all the outputs are fed to users history.
this way there will be no ambiguity as to which bams are being accessed for viz. Is this intermediate way of getting dataset ids possible?
Thank you -Asma
On Wed, Aug 19, 2015 at 4:27 PM, Carl Eberhard <carlfeberhard@gmail.com> wrote:
Hi, Asma
If you're looking through datasets via the mako part of your visualization, you can use:
users_current_history = trans.history dataset_ids = [ trans.security.encode_id( d.id ) for d in users_current_history.datasets ]
(or similar) to build the ids needed for the urls.
If you want to get the info via javascript, you can use something like the python above and template into a js var:
var url = "/datasets/${ dna_dataset_id }/display?to_ext=bam"
...or encode and template the history id and use ajax and the api after the page is served:
var historyId = "${ trans.security.encode_id( trans.history.id ) }"; jQuery.ajax( galaxy_config.root + 'api/histories/' + historyId + '/contents' ) .done( function( response ){ /* will contain summary json for each dataset including encoded ids for each */ })
Let me know if that's not what you were looking for or if you find any problems with it.
On Wed, Aug 19, 2015 at 11:01 AM, Asma Riyaz <asmariyaz23@gmail.com> wrote:
Hello Galaxy-dev,
I thank you so much for all the help you have given me.
I have a question about data set ids in galaxy. As a background, I am running my own galaxy instance on a server. A pipeline implemented in the lab produces the following files in the history:
1) 2 BAM files 2) A JSON file
My goal is to use this JSON file to pass the path/URL of bam files into a custom JS we wrote for visualization purpose.
This JSON file contains among many other details the paths/URLs to the above bam files. I am using JSON filetypes to send data to the JS visualization within Galaxy. To do this, I have my own JS which loads a BAM file from URL provided into an IGV.js track. IGV.js, which is responsible for making the tracks, expects a valid URL which is updated in the JSON file in this manner:
1) Extract the API_key and history id from a loaded BAM file 2) Edit the JSON file to reflect the BAM file's dataset id to be something like this:
{ "CLL-HL_pilot.r1.fastq": { "DNA": "/datasets/36ddb788a0f14eb3/display?to_ext=bam", ...
This works fine if I know the API Key for bam files. When a pipeline executes dataset ids are generated for each output. I want to access and include these ids in the JSON file and load the updated JSON file into the history with the bams. Is there a way to get the ids from the history in this manner?
Thank you,
Asma
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/