Dear Galaxy developers, Our institution is trying solve our storage problem (we need lots, especially for NGS data, and someone needs to fund it). What we would like to be able to do, is based on some criteria control in what location a file gets written to disk. This criteria could be an individual user, a role or group they belong to, or a project the file is associated with. What we'd like to know are the following 3 things: 1) Is anyone already working on something like this? 2) Are there other institutions that would be interested in this type of functionality? 3) If we were to attempt to implement this ourselves, would anyone be interested in giving us some input with respect to how to implement and how to make it generic enough to meet the needs of most institutions? If we're going to do it, we'll need to be able to produce an estimate of what the effort would be like so that we could get institutional funding to develop the functionality. Thanks for any input you can provide. Dave -- Dave Walton Computational Sciences The Jackson Laboratory Bar Harbor, Maine
Hi Dave We do something similar already: We store the NGS data outside of the Galaxy directory tree in our NGS repository. And our (selfwritten) NGS tools in Galaxy know were to find the data in the repository and were to put it. (see 'key insight 3' and '4' in my presentation at this years Galaxy Conference, http://wiki.g2.bx.psu.edu/Events/GCC2011 ) This is very easy to implement. In the worst case you need an additional wrapper around your tools with the information about the location of the data. However you need to keep in mind a few 'features' or drawbacks: - you can't look at your data. The galaxy history might just be a log file saying something like: "the alignment was successful" - in our setup, everything is owned by the user galaxy. But you can take advantage of the pre-defined $userEmail variable for access control (see 'key insight '5' in my presentation). And you can extend your wrappers: you can also use this not only to control the access, but also to read from and write to different storage locations I agree, this is a quick and dirty solution...but it is simple! It also offers you the possibility to access and manipulate your data without the Galaxy framework. Regards, Hans On 08/11/2011 04:44 PM, Dave Walton wrote:
Dear Galaxy developers,
Our institution is trying solve our storage problem (we need lots, especially for NGS data, and someone needs to fund it). What we would like to be able to do, is based on some criteria control in what location a file gets written to disk.
This criteria could be an individual user, a role or group they belong to, or a project the file is associated with.
What we'd like to know are the following 3 things: 1) Is anyone already working on something like this?
2) Are there other institutions that would be interested in this type of functionality?
3) If we were to attempt to implement this ourselves, would anyone be interested in giving us some input with respect to how to implement and how to make it generic enough to meet the needs of most institutions? If we're going to do it, we'll need to be able to produce an estimate of what the effort would be like so that we could get institutional funding to develop the functionality.
Thanks for any input you can provide.
Dave
-- Dave Walton Computational Sciences The Jackson Laboratory Bar Harbor, Maine
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Dave At University of Chicago, we are working on designing and implementing a storage solution that addresses the criteria you describe below. It is still in very early stage but we will be happy to work with you and get your requirements. On Aug 11, 2011, at 9:44 AM, Dave Walton wrote:
Dear Galaxy developers,
Our institution is trying solve our storage problem (we need lots, especially for NGS data, and someone needs to fund it). What we would like to be able to do, is based on some criteria control in what location a file gets written to disk.
This criteria could be an individual user, a role or group they belong to, or a project the file is associated with.
What we'd like to know are the following 3 things: 1) Is anyone already working on something like this?
2) Are there other institutions that would be interested in this type of functionality?
3) If we were to attempt to implement this ourselves, would anyone be interested in giving us some input with respect to how to implement and how to make it generic enough to meet the needs of most institutions? If we're going to do it, we'll need to be able to produce an estimate of what the effort would be like so that we could get institutional funding to develop the functionality.
Thanks for any input you can provide.
Dave
-- Dave Walton Computational Sciences The Jackson Laboratory Bar Harbor, Maine
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Ravi K Madduri The Globus Alliance | Argonne National Laboratory | University of Chicago http://www.mcs.anl.gov/~madduri
Dave Walton wrote:
Dear Galaxy developers,
Our institution is trying solve our storage problem (we need lots, especially for NGS data, and someone needs to fund it). What we would like to be able to do, is based on some criteria control in what location a file gets written to disk.
This criteria could be an individual user, a role or group they belong to, or a project the file is associated with.
What we'd like to know are the following 3 things: 1) Is anyone already working on something like this?
Hi Dave, We're working on an abstraction layer which will allow Galaxy data to live in multiple places instead of the single-point "files_path" that is currently used. Enis Afgan wrote the initial implementation and I am hoping to complete it within the next few months. This won't have any per-user logic, but it should provide a piece of what you are hoping to do. --nate
2) Are there other institutions that would be interested in this type of functionality?
3) If we were to attempt to implement this ourselves, would anyone be interested in giving us some input with respect to how to implement and how to make it generic enough to meet the needs of most institutions? If we're going to do it, we'll need to be able to produce an estimate of what the effort would be like so that we could get institutional funding to develop the functionality.
Thanks for any input you can provide.
Dave
-- Dave Walton Computational Sciences The Jackson Laboratory Bar Harbor, Maine
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Nate I brought this issue up at the users conference and I wanted to bring it up again. How does somebody like us keep track of new development like this and how can we contribute? Regards On Aug 22, 2011, at 11:57 AM, Nate Coraor wrote:
Dave Walton wrote:
Dear Galaxy developers,
Our institution is trying solve our storage problem (we need lots, especially for NGS data, and someone needs to fund it). What we would like to be able to do, is based on some criteria control in what location a file gets written to disk.
This criteria could be an individual user, a role or group they belong to, or a project the file is associated with.
What we'd like to know are the following 3 things: 1) Is anyone already working on something like this?
Hi Dave,
We're working on an abstraction layer which will allow Galaxy data to live in multiple places instead of the single-point "files_path" that is currently used. Enis Afgan wrote the initial implementation and I am hoping to complete it within the next few months.
This won't have any per-user logic, but it should provide a piece of what you are hoping to do.
--nate
2) Are there other institutions that would be interested in this type of functionality?
3) If we were to attempt to implement this ourselves, would anyone be interested in giving us some input with respect to how to implement and how to make it generic enough to meet the needs of most institutions? If we're going to do it, we'll need to be able to produce an estimate of what the effort would be like so that we could get institutional funding to develop the functionality.
Thanks for any input you can provide.
Dave
-- Dave Walton Computational Sciences The Jackson Laboratory Bar Harbor, Maine
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Ravi K Madduri The Globus Alliance | Argonne National Laboratory | University of Chicago http://www.mcs.anl.gov/~madduri
Ravi Madduri wrote:
Nate I brought this issue up at the users conference and I wanted to bring it up again. How does somebody like us keep track of new development like this and how can we contribute?
Hi Ravi, The best way is probably to ask on the dev list whether we are, or have interest in working on something. I do agree that it can be difficult to know what we're working on, but part of the reason (in my own case, anyway) is that not everything I work on makes it to the light of day in a timely manner, so I tend not to make a lot of noise about it until it's well along. --nate
Regards On Aug 22, 2011, at 11:57 AM, Nate Coraor wrote:
Dave Walton wrote:
Dear Galaxy developers,
Our institution is trying solve our storage problem (we need lots, especially for NGS data, and someone needs to fund it). What we would like to be able to do, is based on some criteria control in what location a file gets written to disk.
This criteria could be an individual user, a role or group they belong to, or a project the file is associated with.
What we'd like to know are the following 3 things: 1) Is anyone already working on something like this?
Hi Dave,
We're working on an abstraction layer which will allow Galaxy data to live in multiple places instead of the single-point "files_path" that is currently used. Enis Afgan wrote the initial implementation and I am hoping to complete it within the next few months.
This won't have any per-user logic, but it should provide a piece of what you are hoping to do.
--nate
2) Are there other institutions that would be interested in this type of functionality?
3) If we were to attempt to implement this ourselves, would anyone be interested in giving us some input with respect to how to implement and how to make it generic enough to meet the needs of most institutions? If we're going to do it, we'll need to be able to produce an estimate of what the effort would be like so that we could get institutional funding to develop the functionality.
Thanks for any input you can provide.
Dave
-- Dave Walton Computational Sciences The Jackson Laboratory Bar Harbor, Maine
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Ravi K Madduri The Globus Alliance | Argonne National Laboratory | University of Chicago http://www.mcs.anl.gov/~madduri
An easy and immediate solution may be to: (a) create a "Link data" tool. The user specifies the data ID and your tool queries a db to find the location and creates a symlink to the data, which is stored on different groups'/projects' disks. While subsequent datafiles will still be stored in the common database/files folder, at least the big raw data files will be stored outside of galaxy's database/files folder. (b) use user storage quotas to manage the galaxy database/files storage. For example, create a group for project X, which has 10TB which is contributed to galaxy. If there are 10 users associated with that project, use to quotas to allocate +1TB addition storage to each of their limits. It may not be as elegant as some alternatives, but you could implement this today. On Mon, Aug 22, 2011 at 12:51 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Ravi Madduri wrote:
Nate I brought this issue up at the users conference and I wanted to bring it up again. How does somebody like us keep track of new development like this and how can we contribute?
Hi Ravi,
The best way is probably to ask on the dev list whether we are, or have interest in working on something. I do agree that it can be difficult to know what we're working on, but part of the reason (in my own case, anyway) is that not everything I work on makes it to the light of day in a timely manner, so I tend not to make a lot of noise about it until it's well along.
--nate
Regards On Aug 22, 2011, at 11:57 AM, Nate Coraor wrote:
Dave Walton wrote:
Dear Galaxy developers,
Our institution is trying solve our storage problem (we need lots, especially for NGS data, and someone needs to fund it). What we would like to be able to do, is based on some criteria control in what location a file gets written to disk.
This criteria could be an individual user, a role or group they belong to, or a project the file is associated with.
What we'd like to know are the following 3 things: 1) Is anyone already working on something like this?
Hi Dave,
We're working on an abstraction layer which will allow Galaxy data to live in multiple places instead of the single-point "files_path" that is currently used. Enis Afgan wrote the initial implementation and I am hoping to complete it within the next few months.
This won't have any per-user logic, but it should provide a piece of what you are hoping to do.
--nate
2) Are there other institutions that would be interested in this type of functionality?
3) If we were to attempt to implement this ourselves, would anyone be interested in giving us some input with respect to how to implement and how to make it generic enough to meet the needs of most institutions? If we're going to do it, we'll need to be able to produce an estimate of what the effort would be like so that we could get institutional funding to develop the functionality.
Thanks for any input you can provide.
Dave
-- Dave Walton Computational Sciences The Jackson Laboratory Bar Harbor, Maine
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Ravi K Madduri The Globus Alliance | Argonne National Laboratory | University of Chicago http://www.mcs.anl.gov/~madduri
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (5)
-
Dave Walton
-
Edward Kirton
-
Hans-Rudolf Hotz
-
Nate Coraor
-
Ravi Madduri