I am trying to understand how FTP and data library upload options are working in Galaxy. When a non-binary file is uploaded through FTP option, it goes through three move operations: 1. First it is copied to a temporary namespace line-by-line converting newlines 2. Then the temporary file is moved back to the FTP directory with the same name [1] 3. Later the newline sanitized FTP file is moved to datasets directory These move operations in Python are carried as copy and delete tasks. I don't see the same approach being taken with data libraries or other file-system import/upload options. I looked at library_common code, but I couldn't follow it [2]. I was wondering if someone could help in understanding how file upload is implemented for different upload mechanisms and datatypes. Also, can FTP upload option reduce the number of move operations? For example, can the original FTP file or temporary file copied/moved directly to the datasets directory? This will be helpful in supporting FTP-type upload where 'galaxy' user isn't the primary owner of user's files (move operations perform chmod and it requires primary ownership). -- Thanks, Shantanu 1. https://bitbucket.org/galaxy/galaxy-central/src/530fb4f8204f2106e11f419c381f... 2. https://bitbucket.org/galaxy/galaxy-central/src/530fb4f8204f/lib/galaxy/weba...
On Jan 8, 2013, at 1:27 PM, Shantanu Pavgi (Campus) wrote:
I am trying to understand how FTP and data library upload options are working in Galaxy. When a non-binary file is uploaded through FTP option, it goes through three move operations: 1. First it is copied to a temporary namespace line-by-line converting newlines 2. Then the temporary file is moved back to the FTP directory with the same name [1] 3. Later the newline sanitized FTP file is moved to datasets directory
These move operations in Python are carried as copy and delete tasks. I don't see the same approach being taken with data libraries or other file-system import/upload options. I looked at library_common code, but I couldn't follow it [2]. I was wondering if someone could help in understanding how file upload is implemented for different upload mechanisms and datatypes.
Also, can FTP upload option reduce the number of move operations? For example, can the original FTP file or temporary file copied/moved directly to the datasets directory? This will be helpful in supporting FTP-type upload where 'galaxy' user isn't the primary owner of user's files (move operations perform chmod and it requires primary ownership)
Hi Shantanu, The code in question is actually in the upload tool, tools/data_source/upload.py. In general, you should be able to minimize the number of copy and delete steps if you put new_file_path, file_path, and ftp_upload_dir in the same filesystem. --nate
.
-- Thanks, Shantanu
1. https://bitbucket.org/galaxy/galaxy-central/src/530fb4f8204f2106e11f419c381f... 2. https://bitbucket.org/galaxy/galaxy-central/src/530fb4f8204f/lib/galaxy/weba...
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Jan 9, 2013, at 12:41 PM, Nate Coraor wrote: On Jan 8, 2013, at 1:27 PM, Shantanu Pavgi (Campus) wrote: I am trying to understand how FTP and data library upload options are working in Galaxy. When a non-binary file is uploaded through FTP option, it goes through three move operations: 1. First it is copied to a temporary namespace line-by-line converting newlines 2. Then the temporary file is moved back to the FTP directory with the same name [1] 3. Later the newline sanitized FTP file is moved to datasets directory These move operations in Python are carried as copy and delete tasks. I don't see the same approach being taken with data libraries or other file-system import/upload options. I looked at library_common code, but I couldn't follow it [2]. I was wondering if someone could help in understanding how file upload is implemented for different upload mechanisms and datatypes. Also, can FTP upload option reduce the number of move operations? For example, can the original FTP file or temporary file copied/moved directly to the datasets directory? This will be helpful in supporting FTP-type upload where 'galaxy' user isn't the primary owner of user's files (move operations perform chmod and it requires primary ownership) Hi Shantanu, The code in question is actually in the upload tool, tools/data_source/upload.py. In general, you should be able to minimize the number of copy and delete steps if you put new_file_path, file_path, and ftp_upload_dir in the same filesystem. --nate Thanks for the reply Nate. I understand that both data libraries and FTP upload methods call upload tool, but I am not following how it is being called by them. For example, FTP upload seems to be doing newline conversion as mentioned earlier, whereas, data library seems to be skipping it (or doing it in a different manner, i.e., not in-place??). Which files should I be looking at to follow FTP and data library upload calls to understand such differences? -- Shantanu
On Jan 14, 2013, at 11:45 PM, Shantanu Pavgi (Campus) wrote:
On Jan 9, 2013, at 12:41 PM, Nate Coraor wrote:
Hi Shantanu,
The code in question is actually in the upload tool, tools/data_source/upload.py. In general, you should be able to minimize the number of copy and delete steps if you put new_file_path, file_path, and ftp_upload_dir in the same filesystem.
--nate
Thanks for the reply Nate. I understand that both data libraries and FTP upload methods call upload tool, but I am not following how it is being called by them. For example, FTP upload seems to be doing newline conversion as mentioned earlier, whereas, data library seems to be skipping it (or doing it in a different manner, i.e., not in-place??). Which files should I be looking at to follow FTP and data library upload calls to understand such differences?
Data library uploads, if instructed not to copy data in to Galaxy, will not modify the file on disk. The code from the library side begins at lib/galaxy/webapps/galaxy/controllers/library_common.py, and then moves through lib/galaxy/tools/actions/upload_common.py, where the library upload becomes a job. From there it goes to the job running system, where tools/data_source/upload.py is executed. --nate
-- Shantanu
participants (2)
-
Nate Coraor
-
Shantanu Pavgi (Campus)