Re: [galaxy-dev] Installing Galaxy and Hooking into a SGE Cluster
(Moving to Galaxy-dev, seems more appropriate). John Jones wrote, On 08/23/2012 05:29 PM:
His original question was about getting Galaxy to recognise LDAP authentication and personal storage space rather than shared storage space as is usual with Galaxy. Licensing only came into it because Greg wanted LDAP authentication to track individual user usage (for billing) and I questioned the legality of billing for the software used by Galaxy, as I'm sure some components have a non-commercial usage licence clause.
1. Technically (billing on an SGE cluster): There's a script in "./contrib" called "collect_sge_job_timings.sh" . details are here: http://dev.list.galaxyproject.org/Detailed-SGE-timing-information-about-gala... Assuming you're using PostgreSQL + SGE, it will give you what you want (break-down of SGE jobs timings and Galaxy users). With little adaptation, you can use it to exact SGE JobID and Galaxy Tool-ID / user ID. If you want to extract it directly from the galaxy database, then the following SQL will get you started: === select email as "user", tool_id as "tool", job_runner_external_id as "SGE_ID" from job, galaxy_user where job.user_id = galaxy_user.id and job_runner_name like 'drmaa%' ; === "job_runner_external_ID" will be the SGE-ID, and once you have the list (per each galaxy user), you can use "qacct" to get the information for the job, then calculate the charges. 2. Legally: IANAL, but to the best of my (limited) understanding: 1. Galaxy itself (server code, etc.) - completely legal to run it commercially, even charge money for it - it is licensed as a BSD/MIT style license. 2. Individual tools: if the tools are GPL'd (e.g. bowtie, tophat, bwa, cufflinks) or BSD/MIT - completely legal to run them commercially, and even charge money for them. if the tools use any special license that restrict commercial use (commonly stated as "free for personal, academic, non-commercial use", e.g. Jim Kent's UCSC Genome Browser tools) - then you can't use them in your commercial company without buying a special license (not even internally). It's really not complicated at all, and most (if not all) tools that you compile from source to install them will have a file called "license" or "copying" that will tell you what is the license. If the tools didn't come with a source code, they are probably not free/open-source for commercial use (e.g. - novoalign). 3. Morally: Anyone who thinks of free/open-source as "charity" is missing the point. There is no "charity", and it's perfectly legal, moral, and even recommended to charge money when offering services that are based on free/open-source tools - the requirement (when using GPL) is to give the source code to users when they ask for it (and some other complications, don't want to start a flame war here) - so if they don't want to pay $1,000 for your service - the are more than welcomed to buy their own cluster and storage and servers and run the program themselves - that is the whole point of free/open source. There is not charity. When people write (and publish) software - they explicitly decide upon the license they prefer, and they should know what are the affects of the license. For example, the Galaxy team released Galaxy under a permissive BSD/MIT style license, so they were completely aware of the fact that not only Galaxy can be incorporated into a commercial product, the license doesn't even require the commercial company to release any code changes they make to Galaxy. It's a conscious decision, not an after-thought, and not charity. (end of my rambling...) -gordon
Thanks Assaf. Would anyone know about my question 1? If I install a local version of Galaxy and connect it to our cluster, where is each user's (uploaded?) data stored? How will the cluster jobs be able to access the data? If anyone has installed galaxy and hooked it up to Sun Grid engine, I'd love to hear from you. Thanks, Greg On Thu, Aug 23, 2012 at 6:29 PM, Assaf Gordon <gordon@cshl.edu> wrote:
(Moving to Galaxy-dev, seems more appropriate).
John Jones wrote, On 08/23/2012 05:29 PM:
His original question was about getting Galaxy to recognise LDAP authentication and personal storage space rather than shared storage space as is usual with Galaxy. Licensing only came into it because Greg wanted LDAP authentication to track individual user usage (for billing) and I questioned the legality of billing for the software used by Galaxy, as I'm sure some components have a non-commercial usage licence clause.
1. Technically (billing on an SGE cluster):
There's a script in "./contrib" called "collect_sge_job_timings.sh" . details are here: http://dev.list.galaxyproject.org/Detailed-SGE-timing-information-about-gala...
Assuming you're using PostgreSQL + SGE, it will give you what you want (break-down of SGE jobs timings and Galaxy users).
With little adaptation, you can use it to exact SGE JobID and Galaxy Tool-ID / user ID.
If you want to extract it directly from the galaxy database, then the following SQL will get you started: === select email as "user", tool_id as "tool", job_runner_external_id as "SGE_ID" from job, galaxy_user where job.user_id = galaxy_user.id and job_runner_name like 'drmaa%' ; === "job_runner_external_ID" will be the SGE-ID, and once you have the list (per each galaxy user), you can use "qacct" to get the information for the job, then calculate the charges.
2. Legally: IANAL, but to the best of my (limited) understanding:
1. Galaxy itself (server code, etc.) - completely legal to run it commercially, even charge money for it - it is licensed as a BSD/MIT style license. 2. Individual tools: if the tools are GPL'd (e.g. bowtie, tophat, bwa, cufflinks) or BSD/MIT - completely legal to run them commercially, and even charge money for them. if the tools use any special license that restrict commercial use (commonly stated as "free for personal, academic, non-commercial use", e.g. Jim Kent's UCSC Genome Browser tools) - then you can't use them in your commercial company without buying a special license (not even internally).
It's really not complicated at all, and most (if not all) tools that you compile from source to install them will have a file called "license" or "copying" that will tell you what is the license. If the tools didn't come with a source code, they are probably not free/open-source for commercial use (e.g. - novoalign).
3. Morally: Anyone who thinks of free/open-source as "charity" is missing the point. There is no "charity", and it's perfectly legal, moral, and even recommended to charge money when offering services that are based on free/open-source tools - the requirement (when using GPL) is to give the source code to users when they ask for it (and some other complications, don't want to start a flame war here) - so if they don't want to pay $1,000 for your service - the are more than welcomed to buy their own cluster and storage and servers and run the program themselves - that is the whole point of free/open source. There is not charity.
When people write (and publish) software - they explicitly decide upon the license they prefer, and they should know what are the affects of the license. For example, the Galaxy team released Galaxy under a permissive BSD/MIT style license, so they were completely aware of the fact that not only Galaxy can be incorporated into a commercial product, the license doesn't even require the commercial company to release any code changes they make to Galaxy. It's a conscious decision, not an after-thought, and not charity.
(end of my rambling...)
-gordon
On Fri, Aug 24, 2012 at 1:25 PM, mailing list <margeemail@gmail.com> aka Greg wrote:
Thanks Assaf.
Would anyone know about my question 1? If I install a local version of Galaxy and connect it to our cluster, where is each user's (uploaded?) data stored? How will the cluster jobs be able to access the data?
By default, all the data is under a single folder, belonging to the galaxy Linux user account. We use /mnt/galaxy/galaxy-dist for this. In order that the cluster jobs can access the data, they must also be able to see this mount. Galaxy does have support for staging files between file systems, but the simpler approach of unified storage is recommended. See http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Cluster
If anyone has installed galaxy and hooked it up to Sun Grid engine, I'd love to hear from you.
We're using Galaxy and SGE. You could run Galaxy on the SGE head node, but in our setup the Galaxy machine is just an SGE submit node (so it can submit jobs and check on them, i.e. you can use qsub and qstat from our Galaxy server). Peter P.S. It is very confusing that your emails give your name as "mailing list" rather than Greg.
On Fri, Aug 24, 2012 at 9:56 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Fri, Aug 24, 2012 at 1:25 PM, mailing list <margeemail@gmail.com> aka Greg wrote:
Thanks Assaf.
Would anyone know about my question 1? If I install a local version of Galaxy and connect it to our cluster, where is each user's (uploaded?) data stored? How will the cluster jobs be able to access the data?
By default, all the data is under a single folder, belonging to the galaxy Linux user account. We use /mnt/galaxy/galaxy-dist for this. In order that the cluster jobs can access the data, they must also be able to see this mount.
So can one user see another's files in your setup? I'd prefer if each user could only see and use his own files. How do users get data into Galaxy in your system? Thanks, Greg
Galaxy does have support for staging files between file systems, but the simpler approach of unified storage is recommended. See http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Cluster
If anyone has installed galaxy and hooked it up to Sun Grid engine, I'd love to hear from you.
We're using Galaxy and SGE. You could run Galaxy on the SGE head node, but in our setup the Galaxy machine is just an SGE submit node (so it can submit jobs and check on them, i.e. you can use qsub and qstat from our Galaxy server).
Peter
P.S. It is very confusing that your emails give your name as "mailing list" rather than Greg.
I think I fixed this?
On Friday, August 24, 2012, greg wrote:
On Fri, Aug 24, 2012 at 9:56 AM, Peter Cock <p.j.a.cock@googlemail.com<javascript:;>> wrote:
On Fri, Aug 24, 2012 at 1:25 PM, mailing list <margeemail@gmail.com<javascript:;>
aka Greg wrote:
Thanks Assaf.
Would anyone know about my question 1? If I install a local version of Galaxy and connect it to our cluster, where is each user's (uploaded?) data stored? How will the cluster jobs be able to access the data?
By default, all the data is under a single folder, belonging to the galaxy Linux user account. We use /mnt/galaxy/galaxy-dist for this. In order that the cluster jobs can access the data, they must also be able to see this mount.
So can one user see another's files in your setup? I'd prefer if each user could only see and use his own files.
No - they only access their data via the website, which has it's own user controls, and prevents user A seeing the files of user B (unless explicitly shared).
How do users get data into Galaxy in your system?
Mostly via the "uplad" tool, or sometimes the remote data tools (eg from UCSC). Some commonly used files are also setup on our system as shared libraries within Galaxy.
Thanks,
Greg
P.S. It is very confusing that your emails give your name as "mailing list" rather than Greg.
I think I fixed this?
Yes :)
On Fri, Aug 24, 2012 at 4:30 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Would anyone know about my question 1? If I install a local version of Galaxy and connect it to our cluster, where is each user's (uploaded?) data stored? How will the cluster jobs be able to access the data?
By default, all the data is under a single folder, belonging to the galaxy Linux user account. We use /mnt/galaxy/galaxy-dist for this. In order that the cluster jobs can access the data, they must also be able to see this mount.
So can one user see another's files in your setup? I'd prefer if each user could only see and use his own files.
No - they only access their data via the website, which has it's own user controls, and prevents user A seeing the files of user B (unless explicitly shared).
How do users get data into Galaxy in your system?
Mostly via the "uplad" tool, or sometimes the remote data tools (eg from UCSC). Some commonly used files are also setup on our system as shared libraries within Galaxy.
It seems weird to ask them to upload something that's already in their home directory and available to the cluster jobs otherwise. I wonder if there's a way they can copy files to the Galaxy data directory? I guess they'd have to let Galaxy know about the new data somehow?
Thanks,
Greg
P.S. It is very confusing that your emails give your name as "mailing list" rather than Greg.
I think I fixed this?
Yes :)
On Mon, Aug 27, 2012 at 2:25 PM, greg <margeemail@gmail.com> wrote:
How do users get data into Galaxy in your system?
Mostly via the "upload" tool, or sometimes the remote data tools (eg from UCSC). Some commonly used files are also setup on our system as shared libraries within Galaxy.
It seems weird to ask them to upload something that's already in their home directory and available to the cluster jobs otherwise.
I was answering about *our* Galaxy, where only a few of the users have a Linux account. In most cases their home directory is on Windows... and not easily accessed from our Linux cluster if at all.
I wonder if there's a way they can copy files to the Galaxy data directory? I guess they'd have to let Galaxy know about the new data somehow?
This is available to a Galaxy administrator for shared libraries of files made available within Galaxy, see: http://wiki.g2.bx.psu.edu/Admin/Data%20Libraries/Uploading%20Library%20Files Perhaps something like Alban Lermine's upload_local_file or Edward Kirton's data_nfs tool on the ToolShed would work for you? Peter
participants (4)
-
Assaf Gordon
-
greg
-
mailing list
-
Peter Cock