Re: [galaxy-dev] [galaxy-user] Uploading large archive histories

19 Feb 2013

...
After discussing this with Getiria, I think what is needed here is a way to use FTP to upload compressed histories from the import screen as is available from the upload data tool.  The problem we're currently experiencing is that when we try to upload a compressed history where the compressed file is greater than 2GB, the process fails.  This is preventing us from being able to effectively have users archive data from Galaxy to reduce our disk usage.
The problem can't be coming from the 2GB browser upload limit because the import tool uses wget to fetch the archived history before importing it. The best way to debug failed history imports is to use SQL to look at the Galaxy jobs table for the error:

select * from job where tool_id='__IMPORT_HISTORY__';

There is trello card to improve import/export functionality: https://trello.com/c/qCfAWeYU
...
Additionally, if someone could provide a better understanding of how the exported histories are treated on the Galaxy server that would be helpful.  e.g. how long do the compressed histories stick around on disk, etc?  We would like to have users archive their data on their own systems and permanently delete their Galaxy history when an analysis is complete, but if the compressed history persists on the Galaxy server the utility of this is somewhat reduced.
Archived histories are Galaxy datasets so that they're easy to read and write. History archive datasets are connected to histories via the JobExportHistoryArchive table, not the HistoryDatasetAssociation table (in order to avoid recursive archiving). They can be cleaned up/removed by using the pgcleanup.py script.

Finally, I've moved this thread to the galaxy-dev list because it concerns local instance configuration and usage.

Best,
J.

Jeremy Goecks

tags

participants (1)