Hello all,

I am trying to write a tool that traces datasets creation for a given Galaxy history id.

Basically, I am trying to start with one dataset and recursively trace back its ancestors.

To this end, I communicate with Galaxy using the great BioBlend Python package. In particular, I’m using the “show_dataset(history_id, dataset_id)” method.

While it looks pretty straight forward, I got stuck on the following situation –

Let’s say I have a dataset with name of: “25: Reorder SAM/BAM on data 21: reordered bam”. By this name, the ancestor of this data set is “data 21”.

But who is “data 21”??

Is it not just the 20^st dataset on the history contents list?

Well, unfortunately, not necessarily: if some history files were deleted (and subsequently purged), say dataset #17-19, they are indeed removed from the history list, however, the name of the dataset (i.e., “21”) isn’t correspondingly modified... I cannot just “pull out” the 20^st dataset on the history contents list…

That is, I am not able to find the history id number of the dataset using “show_dataset(history_id, dataset_id)” or any other API command…

To be clear, When I say “History id” I mean – for a dataset “25: Reorder SAM/BAM on data 21: reordered bam”, then “25” is the history id of dataset “Reorder SAM/BAM on data 21: reordered bam”

Any help will be much appreciated!

Liram