On Tue, Mar 25, 2014 at 11:42 AM, Hans-Rudolf Hotz wrote:
Hi Brian
It is difficult to imagine where such files are coming from. As each file in ~/database/files/*/ has a corresponding entry in the 'dataset' table. If you have such files, then something went horribly wrong, eg by using two different databases on the same files system. To double check manually compare the list of files on the file system, with the contents of the 'dataset' table.
Also, have a look at the ~/database/job_working_directory/ and ~/database/tmp/ directories. You might have some old, big files there using a lot of space.
Regards, Hans-Rudolf
I think Brian is right - there is nothing to stop tools writing temp files under ~/database/files/*/ - the most obvious examples would be any index file named after the input dataset it is indexing (e.g. like *.bai and *.fai files normally next to a *.bam or *.fasta). Similarly file format conversions may also be generated next to the input file (although here using $TMP seems preferable). Tool/wrappers may or may not clean up this sort of thing, and in the event of a job failure, stray files are more likely to remain. So I agree, it would be good to have a script audit what Galaxy thinks is under ~/database/files/*/ tracked in its database, and what extra files have appeared. Regards, Peter