Hi, On Aug 26, 2010, at 3:22 PM, Fanny Coffin wrote:
Hi,
I'm trying to evaluate the possibility to use Galaxy on our production environment for NGS data.
And I've a question about the data storage. So, NGS provides huge files that we store on our servers in a specific folder organisation. By using Galaxy, these files have to be uploaded (in order to fill in the database with information like the first lines, the fields...). But I'm wondering whether these files necessarily have to be imported in the Galaxy workspace or whether they can just be linked? My question comes from the fact that we absolutely would like to avoid data duplication.
Could you please enlighten me about that?
AFAIK most of the data will be duplicated in uploading/importing. I suggest you to deploy galaxy on a filesystem that has deduplication capabilities. I've successfully installed galaxy on Nexenta CP3 + ZFS (waiting for Illumos). Recent ZFS builds support deduplication and compression. HTH d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */