Re: [galaxy-dev] Get dataset/API ids for a dataset

21 Aug 2015

      As far as I know, it's best to write tool wrappers as if they were meant to
be called outside of Galaxy. In other words, it would not be best to try
and get Galaxy dataset ids within the tool code.

If we zoom out on the problem and take a higher view, is the JSON file
primary used to launch the visualization or does it have another use as
well?

Can/is the JSON passed to IGV directly in javascript? or does it need to be
read from the file separately by IGV and independent from the visualization
mako and its associated javascript?

If the JSON data is only for the visualization, doesn't need to be 100%
correct in the file, and can be passed within javascript, you can alter the
JSON data directly in the visualization mako/js by decorating with the bam
ids before passing it to IGV. In that case, the previous code (or a cleaner
version of it) will begin to get you there.

Unfortunately, the previous code should work in the visualization mako
only. (With the correction:
from galaxy import model
instead of
import model

The trans is an object describing the current WebTransaction
(request/response). It allows access to a sqlalchemy (sa) database session:
trans.sa_session

On Fri, Aug 21, 2015 at 11:23 AM, Asma Riyaz <asmariyaz23@gmail.com> wrote:
...
Hi Carl,
The visualization comes into play after a lab implemented tool in galaxy
is ran; I am not using galaxy's workflow. Is tracing back ids still
possible in this case with the test case you wrote earlier?
Also I couldn't figure out what "trans" refers to in your previous
message.
Thank you,
Asma
On Thu, Aug 20, 2015 at 3:03 PM, Asma Riyaz <asmariyaz23@gmail.com> wrote:
...
---------- Forwarded message ----------
From: Carl Eberhard <carlfeberhard@gmail.com>
Date: Thu, Aug 20, 2015 at 2:46 PM
Subject: Re: [galaxy-dev] Get dataset/API ids for a dataset
To: Asma Riyaz <asmariyaz23@gmail.com>
Cc: galaxy-dev <galaxy-dev@lists.galaxyproject.org>
If I understand correctly, this begins to sound less like something the
visualization level can do and more something that needs to be handled at
your tool level.
Let me repeat back what I understand to be the process:
1. Your pipeline is activated by the user and some initial step in the
pipeline creates the JSON file that will configure your visualization
2. Some indeterminate number of bam files are created
3. The pipeline finishes and at this point the encoded ids of all the bam
files created by the pipeline should be used in urls added to the JSON file
from step 1
4. The user then clicks on one of the outputs (the JSON file? Yes ) from
the pipeline to launch the visualization and the JSON file is read
Do I have that right? When you say pipeline does that mean a Galaxy
workflow?
---> Yes, this is exactly what I want to do. No, not a Galaxy workflow,
but a pipeline written in the lab which is then converted to a galaxy tool.
Is 'hda' available at the tool level as well? I have read through the
Bioblend API but couldn't figure out a way to query for only those datasets
that being worked on by galaxy in current history.
If so, you can access the workflow using the ORM by tracing up from the
dataset that invoked the visualization and then back down to the bam files
that were created in the workflow steps:
<%
# get the bam datasets created by the workflow that created 'hda'
# where 'hda' is the dataset the visualization launched from
import model
w = trans.sa_session.query( model.WorkflowInvocationStep )
    .filter( WorkflowInvocationStep.job == hda.creating_job
).one().workflow_invocation
ids = [ d.dataset.id for d in s.job.output_datasets if d.dataset.ext ==
'bam' ] for s in w.steps if s.job ]
urls = [ ... ]
%>
(The above is really horrible code, but sketches one way you could get
the ids from the visualization mako)
If it's not a workflow and a pipeline being run from within a Galaxy tool
wrapper, then the tool wrapper code should be writing the ids to the JSON
file. Is that the case instead?
--> Yes, tool wrapper code is where the IDs need to inserted. Will try
using "model" and see where I get with it.
-Thank you,
Asma
On Thu, Aug 20, 2015 at 1:28 PM, Asma Riyaz <asmariyaz23@gmail.com>
wrote:
...
Hi Carl,
Thank you for your reply. This definitely helps me get started, my
question being:
trans.history will get all the dataset ids in users history regardless
of which run the datasets are associated with. Hence if the user has
multiple bams loaded in history there will be no way of distinguishing them.
Here is a rough idea of what I am envisioning my pipeline to do:
Galaxy pipeline runs -> while it is running, dataset ids that are
generated should be retrieved for each output (in my case bams and JSON
file) -> when main pipeline finishes, ids are updated within the JSON file
-> all the outputs are fed to users history.
this way there will be no ambiguity as to which bams are being accessed
for viz. Is this intermediate way of getting dataset ids possible?
Thank you
-Asma
On Wed, Aug 19, 2015 at 4:27 PM, Carl Eberhard <carlfeberhard@gmail.com>
wrote:
...
Hi, Asma
If you're looking through datasets via the mako part of your
visualization, you can use:
...
users_current_history = trans.history
dataset_ids = [ trans.security.encode_id( d.id ) for d in
users_current_history.datasets ]
(or similar) to build the ids needed for the urls.
If you want to get the info via javascript, you can use something like
the python above and template into a js var:
...
var url = "/datasets/${ dna_dataset_id }/display?to_ext=bam"
...or encode and template the history id and use ajax and the api after
the page is served:
...
var historyId = "${ trans.security.encode_id( trans.history.id ) }";
jQuery.ajax( galaxy_config.root + 'api/histories/' + historyId +
'/contents' )
    .done( function( response ){ /* will contain summary json for each
dataset including encoded ids for each */ })
Let me know if that's not what you were looking for or if you find any
problems with it.
On Wed, Aug 19, 2015 at 11:01 AM, Asma Riyaz <asmariyaz23@gmail.com>
wrote:
...
Hello Galaxy-dev,
I thank you so much for all the help you have given me.
I have a question about data set ids in galaxy. As a background, I am
running my own galaxy instance on a server. A pipeline implemented in
the lab produces the following files in the history:
1) 2 BAM files
2) A JSON file
My goal is to use this JSON file to pass the path/URL of bam files
into a custom JS we wrote for visualization purpose.
This JSON file contains among many other details the paths/URLs to the
above bam files. I am using JSON filetypes to send data to the JS
visualization within Galaxy. To do this, I have my own JS which loads a BAM
file from URL provided into an IGV.js track. IGV.js, which is
responsible for making the tracks, expects a valid URL which is updated in
the JSON file in this manner:
1) Extract the API_key and history id from a loaded BAM file
2) Edit the JSON file to reflect the BAM file's dataset id to be
something like this:
{
  "CLL-HL_pilot.r1.fastq": {
    "DNA": "/datasets/36ddb788a0f14eb3/display?to_ext=bam",
    ...
This works fine if I know the API Key for bam files. When a pipeline
executes dataset ids are generated for each output. I want to access and
include these ids in the JSON file and load the updated JSON file into the
history with the bams. Is there a way to get the ids from the history in
this manner?
Thank you,
Asma
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/