Re: [galaxy-dev] [galaxy-user] operating on, and getting at, large files in galaxy...
Hi Nick, Yes, these nextgen reads files are huge and getting bigger every quarter! But there will be storage issues nomatter whether you use Galaxy or not. In fact, i think users are more likely to cleanup files and histories in galaxy than they are to cleanup NFS folders -- out of sight, out of mind! Firstly, I think unnecessary intermediate files are more of a problem than whether or not the file is compressed or not. Indeed, just transferring these files back and forth from the cluster takes a while, not to mention the delay in waiting to be rescheduled for each step. And so I created a tool which would do the job of fastq groomer, end-trimmer, process pairs, and a few other simple tasks -- all in one shot. I haven't uploaded it to the toolshed yet but I will. I hate to duplicate existing tools, but i have a lot of seq data. I will also create a fastqilluminabz2 datatype as well and include it with the tool. For getting files into galaxy, I created a simple tool which would allow staff to enter NFS paths and the option to either copy or symlink if the location was considered stable. I allowed only certain folders (e.g. /home, /storage) and added a password, for security. Similarly, for getting a file out, all you need is a dinky tool for users to provide a destination path. since i've got galaxy running as a special galaxy user in a special galaxy group, file access is restricted (as it should be), so i tell users to create a dropbox folder in their homedir (and chmod 777). by creating a tool like this, you don't need to care how galaxy names the files. i deliberately try to not mess around under the hood. i can upload these to galaxy toolshed, but like i said, there isn't much to them. Ed On Wed, Feb 9, 2011 at 4:17 AM, Nick Schurch <N.Schurch@dundee.ac.uk> wrote:
Hi all,
I've recently encountered a few problems when trying to use Galaxy which are really driving me away from using it as a bioinformatics platform for NGS. I was wonderinf if there are any simple solutions that I've missed...
Firstly, It seems that while there are a few solutions for getting large files (a few GB) into a local install of galaxy without going through HTTP, many tools that operate on these files produces multiple, uncompressed large files which quickly eat up the disk allocation. This is particularly significant in a workflow that has multiple processing steps which each leave behind a large file. With no way to compress or archive files produced by intermediate steps in a workflow, and no desire to delete them since I may need to go back to them and they can take hours to re-run, the only two remaining options seem to be to save them and then delete them.
And this brings me to the second problem. Getting large files out of Galaxy. The only way to save large files from Galaxy (that I can see) is the save icon, which downloads the file via http. This take *ages* for a large file and also causes big headaches for my firefox browser. I've taken a quick peek at the Galaxy file system to see if I could just copy a file, but its almost completely indecipherable if you want to find out what file in the file system corresponds to a file saved from a tool. Is there some way to get the location of a particular file on the galaxy file system, that I can just copy?
-- Cheers,
Nick Schurch
Data Analysis Group (The Barton Group), School of Life Sciences, University of Dundee, Dow St, Dundee, DD1 5EH, Scotland, UK
Tel: +44 1382 388707 Fax: +44 1382 345 893
-- Cheers,
Nick Schurch
Data Analysis Group (The Barton Group), School of Life Sciences, University of Dundee, Dow St, Dundee, DD1 5EH, Scotland, UK
Tel: +44 1382 388707 Fax: +44 1382 345 893
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
participants (1)
-
Edward Kirton