Is there an existing method for finding files in $GALAXY_DIR/database/files that have been orphaned due to errors, or leftover from testing, etc? I'm pretty sure ours is bigger than it ought to be, and I'd really like to look at a little audit of the files there vs the ones registered in Galaxy. Thanks! -- Brian Claywell, Systems Analyst/Programmer Fred Hutchinson Cancer Research Center bclaywel@fhcrc.org
Hi Brian, Have a look at purging histories and datasets: https://wiki.galaxyproject.org/Admin/Config/Performance/Purge%20Histories%20... Thanks for using Galaxy, Dan On Mar 22, 2014, at 9:24 PM, Brian Claywell <bclaywel@fhcrc.org> wrote:
Is there an existing method for finding files in $GALAXY_DIR/database/files that have been orphaned due to errors, or leftover from testing, etc? I'm pretty sure ours is bigger than it ought to be, and I'd really like to look at a little audit of the files there vs the ones registered in Galaxy.
Thanks!
-- Brian Claywell, Systems Analyst/Programmer Fred Hutchinson Cancer Research Center bclaywel@fhcrc.org ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Daniel, I've been using the purge scripts in cron jobs for a long time now; unless I missed something on that page, what I'm looking for is something slightly different -- rather than purging datasets marked as old/deleted/whatever in the Galaxy db, I want to find files that *aren't* in the Galaxy db but still exist on the filesystem. On Sun, Mar 23, 2014 at 12:39 PM, Daniel Blankenberg <dan@bx.psu.edu> wrote:
Hi Brian,
Have a look at purging histories and datasets: https://wiki.galaxyproject.org/Admin/Config/Performance/Purge%20Histories%20...
Thanks for using Galaxy,
Dan
On Mar 22, 2014, at 9:24 PM, Brian Claywell <bclaywel@fhcrc.org> wrote:
Is there an existing method for finding files in $GALAXY_DIR/database/files that have been orphaned due to errors, or leftover from testing, etc? I'm pretty sure ours is bigger than it ought to be, and I'd really like to look at a little audit of the files there vs the ones registered in Galaxy.
Thanks!
-- Brian Claywell, Systems Analyst/Programmer Fred Hutchinson Cancer Research Center bclaywel@fhcrc.org ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Brian Claywell, Systems Analyst/Programmer Fred Hutchinson Cancer Research Center bclaywel@fhcrc.org
Hi Brian It is difficult to imagine where such files are coming from. As each file in ~/database/files/*/ has a corresponding entry in the 'dataset' table. If you have such files, then something went horribly wrong, eg by using two different databases on the same files system. To double check manually compare the list of files on the file system, with the contents of the 'dataset' table. Also, have a look at the ~/database/job_working_directory/ and ~/database/tmp/ directories. You might have some old, big files there using a lot of space. Regards, Hans-Rudolf On Mar 24, 2014, at 12:43 AM, Brian Claywell wrote:
Daniel,
I've been using the purge scripts in cron jobs for a long time now; unless I missed something on that page, what I'm looking for is something slightly different -- rather than purging datasets marked as old/deleted/whatever in the Galaxy db, I want to find files that *aren't* in the Galaxy db but still exist on the filesystem.
On Sun, Mar 23, 2014 at 12:39 PM, Daniel Blankenberg <dan@bx.psu.edu> wrote:
Hi Brian,
Have a look at purging histories and datasets: https://wiki.galaxyproject.org/Admin/Config/Performance/Purge%20Histories%20...
Thanks for using Galaxy,
Dan
On Mar 22, 2014, at 9:24 PM, Brian Claywell <bclaywel@fhcrc.org> wrote:
Is there an existing method for finding files in $GALAXY_DIR/database/files that have been orphaned due to errors, or leftover from testing, etc? I'm pretty sure ours is bigger than it ought to be, and I'd really like to look at a little audit of the files there vs the ones registered in Galaxy.
Thanks!
-- Brian Claywell, Systems Analyst/Programmer Fred Hutchinson Cancer Research Center bclaywel@fhcrc.org ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Brian Claywell, Systems Analyst/Programmer Fred Hutchinson Cancer Research Center bclaywel@fhcrc.org ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Tue, Mar 25, 2014 at 11:42 AM, Hans-Rudolf Hotz wrote:
Hi Brian
It is difficult to imagine where such files are coming from. As each file in ~/database/files/*/ has a corresponding entry in the 'dataset' table. If you have such files, then something went horribly wrong, eg by using two different databases on the same files system. To double check manually compare the list of files on the file system, with the contents of the 'dataset' table.
Also, have a look at the ~/database/job_working_directory/ and ~/database/tmp/ directories. You might have some old, big files there using a lot of space.
Regards, Hans-Rudolf
I think Brian is right - there is nothing to stop tools writing temp files under ~/database/files/*/ - the most obvious examples would be any index file named after the input dataset it is indexing (e.g. like *.bai and *.fai files normally next to a *.bam or *.fasta). Similarly file format conversions may also be generated next to the input file (although here using $TMP seems preferable). Tool/wrappers may or may not clean up this sort of thing, and in the event of a job failure, stray files are more likely to remain. So I agree, it would be good to have a script audit what Galaxy thinks is under ~/database/files/*/ tracked in its database, and what extra files have appeared. Regards, Peter
participants (4)
-
Brian Claywell
-
Daniel Blankenberg
-
Hans-Rudolf Hotz
-
Peter Cock