Hi, 

On Aug 26, 2010, at 3:22 PM, Fanny Coffin wrote:

Hi,

I'm trying to evaluate the possibility to use Galaxy on our production
environment for NGS data.

And I've a question about the data storage. So, NGS provides huge files
that we store on our servers in a specific folder organisation. By using
Galaxy, these files have to be uploaded (in order to fill in the
database with information like the first lines, the fields...). But I'm
wondering whether these files necessarily have to be imported in the
Galaxy workspace or whether they can just be linked? My question comes
from the fact that we absolutely would like to avoid data duplication.

Could you please enlighten me about that?


AFAIK most of the data will be duplicated in uploading/importing. I suggest you to deploy galaxy on a filesystem that has deduplication capabilities.
I've successfully installed galaxy on Nexenta CP3 + ZFS (waiting for Illumos). Recent ZFS builds support deduplication and compression.
HTH
d


/*
Davide Cittaro

Cogentech - Consortium for Genomic Technologies
via adamello, 16
20139 Milano
Italy

tel.: +39(02)574303007
e-mail: davide.cittaro@ifom-ieo-campus.it
*/