Ah. Then this is more subtle... are you using the library import option where Galaxy just symlinks to existing files? I thought that was not possible with gzipped files (for the reasons given below). Perhaps this is not being blocked, leading to the confused state you're seeing?
Peter
On Mon, Jan 12, 2015 at 4:52 PM, Ryan G ngsbioinformatics@gmail.com wrote:
Galaxy is not decompressing the file. The file is linked to on the filesystem.
On Mon, Jan 12, 2015 at 10:28 AM, Peter Cock p.j.a.cock@googlemail.com wrote:
Hi Ryan,
The problem isn't Galaxy stripping the extension, rather Galaxy is actually decompressing the file as part of the upload process.
Unfortunately (and there is an open Trello enhancement request on this), Galaxy does not support sorting any of the defined datatypes in compressed form UNLESS they are defined that way (like BAM files).
This has lead some Galaxy Admins to define a new datatype lgzippedfastq (or similar - I'd have to check my old emails for the exact name used as a gripped alternative to the Galaxy sangerfastq datatype) and then modified many/all their tools to handle this. That is a lot of work, but does offer big disk savings for this key datatype.
The Galaxy team instead use a compressed file system, so for usegalaxy.org ALL their data files are compressed but Galaxy can ignore this complexity.
Peter
On Mon, Jan 12, 2015 at 3:15 PM, Ryan G ngsbioinformatics@gmail.com wrote:
Hi all - I've got a bunch of fatsq files uploaded into a data library in Galaxy. The underlying files is gzipped however Galaxy strips the .gz from the filename and displays it as .fastq. When the python wrapper rgFastQC.py gets called, it correctly sees the fastq.gz file. The wrapper creates a symbolic link to the .gz file in a tmp directory. The link is .fastq. When FastQC tries to read this file, it fails because its compressed. So one of two things is going wrong here:
- It looks like the wrapper is incorrectly renaming the file, but its
using the name given to it in Galaxy.
- When the file is uploaded into the data library, Galaxy is stripping
off the .gz extension.
I think #2 is the more correct problem. How can I keep Galaxy from stripping the .gz extension?
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/