Re: [galaxy-user] Disk Quote

6 Feb 2012

      Hello Zack,

There are a two options:
  - use the public server Main, but run the data through in batches
  - use a local or cloud instance, as a whole or in batches
    http://getgalaxy.org

To decide how to batch the data, perhaps use just one of your datasets 
to make some estimates about how much disk each will use as it moves 
through your planned analysis pipeline. Then you can decide how many to 
run at a time. You may also want to consider how much disk needs to be 
reserved for summary data to be pooled at the end for final steps. It 
would be good to know if the public instance is going to be the right 
resource for your project overall before you start.

Often a workflow will have many intermediate steps where the same 
dataset is changed in small ways as it is transformed. These 
intermediate files are likely not needed and can be purged (permanently 
deleted) once the processing is past a certain point and confirmed to be 
OK. Then there are steps where very large raw data is reduced to much 
smaller summary data - the large data can often be archived then 
permanently deleted (purged).

The optimal workflow would include some phases where processing occurs 
in a streaming mode and some phases where it pauses to allow for data 
review and cleanup, to make space for downstream analysis steps or more 
analysis cycles.

Data purged from the public Galaxy instance can be first saved locally 
to your own computer/servers/cloud resource as individual files or as 
entire histories for use in local instances or cloud instances as 
reviewable archives or actionable work environments (to be used at a 
later time, locally or loaded back up into the public Galaxy instance).

For more about data management, please see the guidelines and tips on 
this wiki. Of most relevance for your case (after Quotas), will be the 
last section, about "delete" (recoverable, counts towards disk quota) 
and "permanently delete" (a.k.a. purged, non-recoverable, does not count 
towards disk quota):

http://wiki.g2.bx.psu.edu/Learn/Managing%20Datasets#Data_size_and_disk_Quota...

We realize that this is not a simple solution. Quotas were a very 
difficult, but necessary, project decision. What we can do is try to 
provide as much support as possible to help our community effectively 
manage their data. So, if you need more help or have questions, please 
let us know,

Take care,

Jen
Galaxy team

On 1/30/12 7:42 AM, Zack Liu wrote:
...
Dear galaxy admins,
I am working on a hi-seq project, where each file is ~ 50G.  I have 8 of
them.  After I uploaded my files, I realized that I have reach my quote
limit. Literally, I can't do any mapping or actions on these files since
I have no disk space for it.
I was wondering if there's anyway I can get more disk storage on galaxy.
Thanks!
Zack Liu
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
-- 
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support