Iyad, Yes, that's a different topology. We run on a shared cluster, so the Galaxy head nodes (app + db) sit next to the cluster, and the galaxy user has no special powers outside the head node, though the filesystem is completely cross-mounted. Regards, Curtis -----Original Message----- From: Kandalaft, Iyad [mailto:Iyad.Kandalaft@AGR.GC.CA] Sent: Monday, June 02, 2014 2:01 PM To: Curtis Hendrickson (Campus); John Chilton Cc: galaxy-dev@lists.bx.psu.edu Subject: RE: [galaxy-dev] Uploading files to galaxy from a folder Hi Curtis, I recall coming across the external chown script you mentioned but for some reason it wasn't working as expected, which is why I opted to insert in my own function in upload.py. I think the documentation in the universe_wsgi.ini didn't match up with how the script worked, so I didn't bother trying to find that functionality in the code. I suppose that you are suggesting that my codebase is somewhat proprietary to our environment, which is correct. On our cluster at AAFC, the "headnode", which differs from the galaxy server, can sync any file across the entire cluster (such as /etc/sudoers.d/galaxy). Hence, I can propagate sudo permissions from the headnode to the compute nodes. Alternatively, I opted to set the upload script to execute locally on the galaxy server as you implied. I will review the codebase related to the original chown script and see if it needs any tweaking (or maybe just better documented). Regards, Iyad Kandalaft Bioinformatics Application Developer Agriculture and Agri-Food Canada | Agriculture et Agroalimentaire Canada KW Neatby Bldg | éd. KW Neatby 960 Carling Ave| 960, avenue Carling Ottawa, ON | Ottawa (ON) K1A 0C6 E-mail Address / Adresse courriel: Iyad.Kandalaft@agr.gc.ca Telephone | Téléphone 613- 759-1228 Facsimile | Télécopieur 613-759-1701 Government of Canada | Gouvernement du Canada -----Original Message----- From: Curtis Hendrickson (Campus) [mailto:curtish@uab.edu] Sent: Monday, June 02, 2014 2:27 PM To: Kandalaft, Iyad; John Chilton Cc: galaxy-dev@lists.bx.psu.edu Subject: RE: [galaxy-dev] Uploading files to galaxy from a folder John, For us, it's not just a question of deleting the file, but of having permission to do python's copystat (see stack trace below). We've tried a variety of permission combinations, including sticky group permissions, various setfacl's and moving the directory on or off the same filesystem as the dataset storage, but still haven't arrived a happy solution. I also looked into Iyad earlier posting of a patch up upload.py, but in our configuration, upload.py runs on the compute nodes (not the galaxy head node) where the galaxy user does NOT have sudo previleges, so we really need a solution that does the chown further up the food chain - before the job is queued. I'm guessing it should go somewhere in lib/galaxy/tools/actions/upload_common.py, but I haven't had time to come to grips with this code yet - any suggestions would be welcome. That code already has the "sudo chown" for running jobs as their real user (at least on my somewhat dated codebase), so I'm thinking that same external_chown script could also be used for this purpose (chown before import). Regards, Curtis Traceback (most recent call last): File "/share/apps/galaxy/galaxy-rollouttest/tools/data_source/upload.py", line 401, in <module> __main__() File "/share/apps/galaxy/galaxy-rollouttest/tools/data_source/upload.py", line 390, in __main__ add_file( dataset, registry, json_file, output_path ) File "/share/apps/galaxy/galaxy-rollouttest/tools/data_source/upload.py", line 270, in add_file line_count, converted_path = sniff.convert_newlines( dataset.path, in_place=in_place ) File "/share/apps/galaxy/galaxy-rollouttest/lib/galaxy/datatypes/sniff.py", line 106, in convert_newlines shutil.move( temp_name, fname ) File "/share/apps/galaxy/python/2.6.6/lib/python2.6/shutil.py", line 260, in move copy2(src, real_dst) File "/share/apps/galaxy/python/2.6.6/lib/python2.6/shutil.py", line 96, in copy2 copystat(src, dst) File "/share/apps/galaxy/python/2.6.6/lib/python2.6/shutil.py", line 66, in copystat os.utime(dst, (st.st_atime, st.st_mtime)) OSError: [Errno 1] Operation not permitted: '/scratch/importfs/galaxy/sunnie/1-5.txt' -----Original Message----- From: galaxy-dev-bounces@lists.bx.psu.edu [mailto:galaxy-dev-bounces@lists.bx.psu.edu] On Behalf Of Kandalaft, Iyad Sent: Sunday, June 01, 2014 7:31 PM To: John Chilton Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Uploading files to galaxy from a folder Hi John I will review the galaxy code more closely and implement it as you suggested. It's been a while since I've implemented this 'fix', so I will have to dig up the code that tries to delete the file after upload. Once it's simple and clean, I will do a pull request. Thanks for your input. Iyad Kandalaft Bioinformatics Programmer Microbial Biodiversity Bioinformatics Science & Technology Branch Agriculture & Agri-Food Canada Iyad.Kandalaft@agr.gc.ca | (613) 759-1228 ________________________________________ From: John Chilton [jmchilton@gmail.com] Sent: June 1, 2014 7:53 PM To: Kandalaft, Iyad Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Uploading files to galaxy from a folder Hello Iyad, Thanks for taking the time to describe your problem and share your solution - I am sure other institutions have desires to harness the ftp upload option the way you are doing it. Is the only problem that occurs if the ownership is incorrect that Galaxy cannot delete the files? If so - I feel like as a deployer I would prefer to have a sudo script that just deletes the files instead of changing the ownership? It is a more... focused sudo command. This strategy could also allow a slightly more general universe_wsgi.ini option - say a few options such as: ftp_upload_delete_handling = __default__ # just assume ownership and delete ftp_upload_delete_handling = __none__ # don't delete files - user is responsible ftp_upload_delete_handling = sudo -E rm # command prefix to delete file I am not saying if you clean up what you have and opened a pull request I would say no - I am just saying I would be more eager for this more general option. If you can confirm the only reason the permissions need to change is to delete the files and likewise prefer this variant but cannot figure out how to modify upload.py - let me know I can try to look into it (no promises that it is easy to implement). -John On Tue, May 27, 2014 at 10:28 AM, Kandalaft, Iyad <Iyad.Kandalaft@agr.gc.ca> wrote:
Hi Everyone
I'm throwing this out there for some feedback and recommendations.
Objective: Facilitate transferring large files (> 2GB) from an HPC cluster (and its associated fast tier storage) to galaxy for my clients. I enabled the FTP upload option in galaxy but it involves users learning to copy files over FTP.
So, I created a galaxy folder in each users' home directory on the HPC Cluster that symbolically links to the FTP upload folder for galaxy. Hence, users can use either FTP to upload files (drag and drop in windows) or simply copy files into this folder from an ssh session on the cluster. The problem with that strategy was that galaxy had to be the owner of the file (similar to the ProFTPd configuration that sets the UID and GID of uploads files to galaxy's UID/GID). Otherwise, galaxy threw errors when it tried deleting the original file from the FTP upload folder. I could have added the galaxy user to the same group as all user but this meant that users would have to ensure the correct permissions are set on files so that galaxy can read and delete the file thereafter. The alternative involved modifying the upload.py tool to chown/chmod files that were being uploaded. Upload.py now sudo executes an external script that sets ownership to the galaxy user and corrects the permissions if required (see attachment for code modification). The galaxy user has sudo rights on this script and the script restricts chown/chmod to the ftp folder path for security reasons.
I was planning to clean up the code and make it production ready by adding an option in universe_wsgi.ini for this "feature", but I thought I would check with the galaxy devs first. Am I taking the wrong approach? Is there a better alternative?
As an alternative, I thought about locating the handler code for dataset.type == file and possibly making it support the SETGID sticky bits on folders. In that case, the FTP upload folder would have the sticky bit set for UID and can assume the role of the user to upload that file.
Your input is much appreciated.
Iyad Kandalaft
Bioinformatics Application Developer
Agriculture and Agri-Food Canada | Agriculture et Agroalimentaire Canada
KW Neatby Bldg | éd. KW Neatby
960 Carling Ave| 960, avenue Carling
Ottawa, ON | Ottawa (ON) K1A 0C6
E-mail Address / Adresse courriel: Iyad.Kandalaft@agr.gc.ca
Telephone | Téléphone 613- 759-1228
Facsimile | Télécopieur 613-759-1701
Government of Canada | Gouvernement du Canada
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/