use of public server vs local installation
Dear Galaxy team: I was recently introduced to Galaxy framework by my colleagues in Singapore. I am very impressed with it and want to start using it right away. Now the questions are whether I should use the public server or attempt a local installation. I understand this depends on how we intend to use the system. What I would like to get advice from you is the security of using public server. If I setup my login and start using the system for private data, say high throughput sequencing data, who else could possibly see my data without my knowing it? As you know, pharmaceutical companies traditionally have a tendency to setup everything if possible locally. With the recent data explosion, it is becoming more and more unrealistic to maintain internal copies of public tools and data. More importantly, I feel it could be a waste of resource and could introduce unnecessary data provenance problems. From the server maintenance standpoint, how much effort is needed to keep the framework up and running? Do you encourage pharmaceutical companies to use your public server? Have you thought about carving out a section of your server to private users for a fee? I anticipate one or two people to use the server initially for a period of time when we initially get the data and afterwards occasionally use it. I hope to avoid local installation if I possible. Best regards, -Jian _____________________________________________________________ Jian Wang, PhD Informatics Eli Lilly & Co. Phone: 317 655 3496 E-mail: jian.wang@lilly.com This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Hello Jian, see my comments inline below. Thanks very much for your interest in using Galaxy. Greg Von Kuster Galaxy Development Team Jian WJ Wang wrote:
Dear Galaxy team:
I was recently introduced to Galaxy framework by my colleagues in Singapore. I am very impressed with it and want to start using it right away. Now the questions are whether I should use the public server or attempt a local installation.
Flexible RBAC security is available in the public Galaxy environment as well as the Galaxy distribution, so using either approach should work for you. I understand this depends on how we intend
to use the system. What I would like to get advice from you is the security of using public server. If I setup my login and start using the system for private data, say high throughput sequencing data, who else could possibly see my data without my knowing it?
If you choose to use the public instance, we can create a Group for you that contains only the users that you would like to be able to access your data. You can secure your histories to be accessible to only you or the Group: from the main menu bar, select "User -> Preferences", then "Change default permissions for new histories" and then associate your private Role with the "access" permission on the history to make the history private to you. If you would like us to create a Group for you, then you can associate this Group with the "access" permission on your histories. You or your group can share histories with each other. Another approach would be to use the Galaxy Data Libraries, which can be secured in a similar way. We can create 1 or more Data Libraries for you on our public instance, securing them such that only you or your Group can access them. If you can access a dataset in a Data Library, then you can "import" it into your history for analysis. One of the very nice things about Data Libraries is that only 1 file exists on disk, no matter how many times the file is "imported" into a history. We have recently made significant improvements in our upload utility for uploading very large datasets into a history. However, these improvements are still under development for uploading very large datasets into a library, so if you choose to use Data Libraries to secure your data, we'll work with you on options for uploading your data to a Data Library if your datasets are very large.
As you know, pharmaceutical companies traditionally have a tendency to setup everything if possible locally. With the recent data explosion, it is becoming more and more unrealistic to maintain internal copies of public tools and data. More importantly, I feel it could be a waste of resource and could introduce unnecessary data provenance problems. From the server maintenance standpoint, how much effort is needed to keep the framework up and running?
Setting up a local Galaxy install is fairly simple if you are familiar with getting the required packages ( e.g., Python, Postgres or mysql, torgue or Sun grid engine, etc ) installed, but could seem difficult if you are not comfortable with these packages. See our documentation at http://getgalaxy.org for all of the information you'll need to get your local installation up and running. Do you encourage pharmaceutical companies to use
your public server?
Yes, if they do not want to host their own local Galaxy instance. Have you thought about carving out a section of your
server to private users for a fee?
We do not have a business model in place for this, so we are not currently doing this. I anticipate one or two people to use
the server initially for a period of time when we initially get the data and afterwards occasionally use it. I hope to avoid local installation if I possible.
It shouldn't be a problem for you to use our public server. Again, we'll work with you on the best way to secure your data if you would like to use it.
Best regards,
-Jian
_____________________________________________________________ Jian Wang, PhD†††††††† Informatics†††††††† Eli Lilly & Co. ††† Phone: 317 655 3496††††††† E-mail: jian.wang@lilly.com
This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. _______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Greg: Thank you very much. This is very helpful. I will think about it and share with others at Lilly and contact you in a few days. -Jian Greg Von Kuster <ghv2@psu.edu> To 09/01/2009 04:03 Jian WJ Wang PM <WANG_JIAN_WJ@LILLY.COM> cc galaxy-user@bx.psu.edu Subject Re: [galaxy-user] use of public server vs local installation Hello Jian, see my comments inline below. Thanks very much for your interest in using Galaxy. Greg Von Kuster Galaxy Development Team Jian WJ Wang wrote:
Dear Galaxy team:
I was recently introduced to Galaxy framework by my colleagues in Singapore. I am very impressed with it and want to start using it right away. Now the questions are whether I should use the public server or attempt a local installation.
to use the system. What I would like to get advice from you is the security of using public server. If I setup my login and start using
Flexible RBAC security is available in the public Galaxy environment as well as the Galaxy distribution, so using either approach should work for you. I understand this depends on how we intend the
system for private data, say high throughput sequencing data, who else could possibly see my data without my knowing it?
If you choose to use the public instance, we can create a Group for you that contains only the users that you would like to be able to access your data. You can secure your histories to be accessible to only you or the Group: from the main menu bar, select "User -> Preferences", then "Change default permissions for new histories" and then associate your private Role with the "access" permission on the history to make the history private to you. If you would like us to create a Group for you, then you can associate this Group with the "access" permission on your histories. You or your group can share histories with each other. Another approach would be to use the Galaxy Data Libraries, which can be secured in a similar way. We can create 1 or more Data Libraries for you on our public instance, securing them such that only you or your Group can access them. If you can access a dataset in a Data Library, then you can "import" it into your history for analysis. One of the very nice things about Data Libraries is that only 1 file exists on disk, no matter how many times the file is "imported" into a history. We have recently made significant improvements in our upload utility for uploading very large datasets into a history. However, these improvements are still under development for uploading very large datasets into a library, so if you choose to use Data Libraries to secure your data, we'll work with you on options for uploading your data to a Data Library if your datasets are very large.
As you know, pharmaceutical companies traditionally have a tendency to setup everything if possible locally. With the recent data explosion, it is becoming more and more unrealistic to maintain internal copies of public tools and data. More importantly, I feel it could be a waste of resource and could introduce unnecessary data provenance problems. From the server maintenance standpoint, how much effort is needed to keep the framework up and running?
Setting up a local Galaxy install is fairly simple if you are familiar with getting the required packages ( e.g., Python, Postgres or mysql, torgue or Sun grid engine, etc ) installed, but could seem difficult if you are not comfortable with these packages. See our documentation at http://getgalaxy.org for all of the information you'll need to get your local installation up and running. Do you encourage pharmaceutical companies to use
your public server?
Yes, if they do not want to host their own local Galaxy instance. Have you thought about carving out a section of your
server to private users for a fee?
We do not have a business model in place for this, so we are not currently doing this. I anticipate one or two people to use
the server initially for a period of time when we initially get the data and afterwards occasionally use it. I hope to avoid local installation if I possible.
It shouldn't be a problem for you to use our public server. Again, we'll work with you on the best way to secure your data if you would like to use it.
Best regards,
-Jian
_____________________________________________________________ Jian Wang, PhD†††††††† Informatics†††††††† Eli Lilly & Co. ††† Phone: 317 655 3496††††††† E-mail: jian.wang@lilly.com
This email message is for the sole use of the intended recipient(s) and
may
contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. _______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
participants (2)
-
Greg Von Kuster
-
Jian WJ Wang