Hello all,

 

I am trying to write a tool that traces datasets creation for a given Galaxy history id.

Basically, I am trying to start with one dataset and recursively trace back its ancestors.

To this end, I communicate with Galaxy using the great BioBlend Python package. In particular, I’m using the “show_dataset(history_id, dataset_id)” method.  

 

While it looks pretty straight forward, I got stuck on the following situation –

Let’s say I have a dataset with name of: “25: Reorder SAM/BAM on data 21: reordered bam”. By this name, the ancestor of this data set is “data 21”.

But who is “data 21”??

Is it not just the 20st dataset on the history contents list?

 

Well, unfortunately, not necessarily: if some history files were deleted (and subsequently purged), say dataset #17-19, they are indeed removed from the history list, however, the name of the dataset (i.e., “21”) isn’t correspondingly modified... I cannot just “pull out” the 20st  dataset on the history contents list…

 

That is, I am not able to find the history id number of the dataset using “show_dataset(history_id, dataset_id)” or any other API command…

To be clear, When I say “History id” I mean – for a dataset “25: Reorder SAM/BAM on data 21: reordered bam”, then “25” is the history id of dataset “Reorder SAM/BAM on data 21: reordered bam”   

 

Any help will be much appreciated!

 

Liram