Re: [galaxy-dev] HOW TO RETRIEVE DATA FROM HISTORY??!!
Is there a way to directly move/copy data from your galaxy history to a given location in the filesystem of the same galaxy server? Said differently, there is a nice way to import data from the server to galaxy, is it possible to do the reverse?
So far, I am obliged to download the file from galaxy to my client machine and then back to the server!!!! with huge bam files of 3Gb it is not so convenient!!
OK, I found a better way by (a) go to the admin panel, push the 'add a new dataset in the library' button and selecting the one needed from the current history (b) move the selected dataset from the library to a location mounted to galaxy. that is ok for me. However, if someone has a better solution, any advices are fine for me thnks colin
Hi Colin, colin molter wrote, On 08/05/2011 01:54 PM:
Is there a way to directly move/copy data from your galaxy history to a given location in the filesystem of the same galaxy server? Said differently, there is a nice way to import data from the server to galaxy, is it possible to do the reverse?
that is ok for me. However, if someone has a better solution, any advices are fine for me
thnks colin
If you feel comfortable meddling directly with your galaxy database, the following query will give you the dataset file numbers for each dataset in a given history. ======== select dataset.id as "DatasetFileNumber", lpad(trunc(dataset.id/1000), 3, '0') as "Directory", 'dataset_' || dataset.id || '.dat' as "Filename", history_dataset_association.hid as "Dataset Number in History", history_dataset_association.name as "Dataset Name" from history, galaxy_user, history_dataset_association, dataset where dataset.id = history_dataset_association.dataset_id and history.id = history_dataset_association.history_id and history.user_id = galaxy_user.id and galaxy_user.email = 'gordon@cshl.edu' and history.name = 'K9981_het' order by history_dataset_association.hid ======= Just change the "history.name" and the "galaxy_user.email" in the query to get the correct history (assuming your histories' names are unique). This SQL is for PostgreSQL, but MySQL should be similar. The first column (DatasetFileNumber) is the file associated with the galaxy dataset (e.g. 2456) The second is the directory (e.g. "002" for dataset 2456), The third is the file name (e.g. "dataset_2456.dat"). A simple shell script should be able to construct the correct path. Note that for datasets larger than 100,000 there's going to be an extra directory of "000/" (e.g. "000/101/dataset_101123.dat"). As always, direct database access is never recommended... hope this helps, -gordon
Assaf Gordon wrote:
Hi Colin,
colin molter wrote, On 08/05/2011 01:54 PM:
Is there a way to directly move/copy data from your galaxy history to a given location in the filesystem of the same galaxy server? Said differently, there is a nice way to import data from the server to galaxy, is it possible to do the reverse?
that is ok for me. However, if someone has a better solution, any advices are fine for me
thnks colin
If you feel comfortable meddling directly with your galaxy database, the following query will give you the dataset file numbers for each dataset in a given history.
======== select dataset.id as "DatasetFileNumber", lpad(trunc(dataset.id/1000), 3, '0') as "Directory", 'dataset_' || dataset.id || '.dat' as "Filename", history_dataset_association.hid as "Dataset Number in History", history_dataset_association.name as "Dataset Name" from history, galaxy_user, history_dataset_association, dataset where dataset.id = history_dataset_association.dataset_id and history.id = history_dataset_association.history_id and history.user_id = galaxy_user.id and galaxy_user.email = 'gordon@cshl.edu' and history.name = 'K9981_het' order by history_dataset_association.hid =======
Just change the "history.name" and the "galaxy_user.email" in the query to get the correct history (assuming your histories' names are unique).
This SQL is for PostgreSQL, but MySQL should be similar.
The first column (DatasetFileNumber) is the file associated with the galaxy dataset (e.g. 2456) The second is the directory (e.g. "002" for dataset 2456), The third is the file name (e.g. "dataset_2456.dat").
A simple shell script should be able to construct the correct path. Note that for datasets larger than 100,000 there's going to be an extra directory of "000/" (e.g. "000/101/dataset_101123.dat").
As always, direct database access is never recommended...
Thanks Assaf, This has come up enough times that I've just committed a script that will return the filename if provided a numeric or encoded HDA. I've posted it to the list before as galaxythinger.py, it's in the source as galaxy-dist/scripts/helper.py as of 5919:0f878ea61e98. It will also decode and encode IDs. More functionality for common sysadmin tasks would be welcomed. --nate
hope this helps, -gordon
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hi Nate. Could you provide more info on how to run this script. I have a history and dataset name and I want to find the file stored on the server. Thanks Shaun
Thanks Assaf,
This has come up enough times that I've just committed a script that will return the filename if provided a numeric or encoded HDA. I've posted it to the list before as galaxythinger.py, it's in the source as galaxy-dist/scripts/helper.py as of 5919:0f878ea61e98.
It will also decode and encode IDs. More functionality for common sysadmin tasks would be welcomed.
--nate
hope this helps, -gordon
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Shaun, we took this quite easily by a simple tool we added to the toolbox. Just use the xml below as a new tool and you can get file paths of any file you like. Hope it helps, Alex <tool id="filelocator" name="Locate Data in Galaxy Database" version="1.0.1"> <description></description> <command>echo $input > $output</command> <inputs> <param name="input" type="data" label="Data file from history" /> </inputs> <outputs> <data format="tabular" name="output" /> </outputs> <help> **What it does** This tool gives the name of the data file from the history as it is in the galaxy database. This information can be helpful with large files, which can then be downloaded directly, through e.g. ftp from the server. </help> </tool> ________________________________________ Van: galaxy-dev-bounces@lists.bx.psu.edu [galaxy-dev-bounces@lists.bx.psu.edu] namens SHAUN WEBB [swebb1@staffmail.ed.ac.uk] Verzonden: donderdag 20 oktober 2011 17:43 Aan: Nate Coraor CC: galaxy-dev@lists.bx.psu.edu Onderwerp: Re: [galaxy-dev] HOW TO RETRIEVE DATA FROM HISTORY??!! Hi Nate. Could you provide more info on how to run this script. I have a history and dataset name and I want to find the file stored on the server. Thanks Shaun
Thanks Assaf,
This has come up enough times that I've just committed a script that will return the filename if provided a numeric or encoded HDA. I've posted it to the list before as galaxythinger.py, it's in the source as galaxy-dist/scripts/helper.py as of 5919:0f878ea61e98.
It will also decode and encode IDs. More functionality for common sysadmin tasks would be welcomed.
--nate
hope this helps, -gordon
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
SHAUN WEBB wrote:
Hi Nate.
Could you provide more info on how to run this script. I have a history and dataset name and I want to find the file stored on the server.
Hi Shaun, You'll need the dataset ID, which you can see as part of the URL of many of the links in the history item. For example, the URL linked by the "eye" icon is: http://server/datasets/1cd8e2f6b131e891/display/?preview=True Using the ID of 1cd8e2f6b131e891, run: % python helper.py --hda=1cd8e2f6b131e891 HDA "3" is Dataset "3" at: /galaxy/database/files/000/dataset_3.dat --nate
Thanks Shaun
Thanks Assaf,
This has come up enough times that I've just committed a script that will return the filename if provided a numeric or encoded HDA. I've posted it to the list before as galaxythinger.py, it's in the source as galaxy-dist/scripts/helper.py as of 5919:0f878ea61e98.
It will also decode and encode IDs. More functionality for common sysadmin tasks would be welcomed.
--nate
hope this helps, -gordon
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
participants (5)
-
Assaf Gordon
-
Bossers, Alex
-
colin molter
-
Nate Coraor
-
SHAUN WEBB