Re: [galaxy-dev] is it possible to upload files to different local directories based on the galaxy upload.py script?
Hello, I would love to be able to have all datasets, job output, etc. go to a different directory for each user. Some of our cluster users have access to much more storage than others, and it would be nice if they could take advantage of that storage through galaxy, too. I was thinking of inserting an extra level of directory structure below galaxy's database directory so that, for example, instead of just database/files there would be database/user1/files, database/user2/files, and so on (and likewise for job_working_directory, pbs, etc.). That way we could make database/userX a symbolic link to a different filesystem with higher quotas when appropriate. I was looking in the code for where the external_filename attribute you mention is handled, and I see a couple key places where this can be hacked in -- JobExternalOutputMetadataWrapper.setup_external_metadata (in each place where I see it called the username could be passed along) and in JobWrapper.__init__, where it sets the job_working_directory (it has the job, so it has the username). However, it seems that's too late -- I see galaxy tring to create things in the normal database locations before any of these are called and the output paths adjusted. It would be nice if this could be coded into the Dataset class itself, but I don't know how to get the appropriate username there... Any suggestions for how to hack something together to make this work? Thanks a lot, John On Thu, 24 Jun 2010, Nate Coraor wrote:
amenda lee wrote:
Hi;
Is it poosible to upload files to the different local directories by Galaxy upload.py script? It seems to upload all files to the same directory based on the file path in universe_wsgi.ini file.
if yes, what part of upload.py needs to be edited? also is it possible to keep name of uploaded file unchanged?
Hi Amenda,
Galaxy manages all of its data internally. If it was not done like this, I am not sure how you would define which directory each upload should go into.
The exception to this is that when using Galaxy Data Libraries, you can choose to leave files in their original locations when uploading them on the server side (via the "Upload a directory of files" and "Upload from filesystem paths" options).
That said, if you wanted to write some sort of custom method for placing files based on some rules at upload, it could be done. Dataset objects have an 'external_filename' attribute which can be set to any filesystem path. This value is stored in the database and will be automatically used by anything which needs access to the disk file.
--nate
Thanks in advance.
Amenda Lee
_______________________________________________ galaxy-dev mailing list galaxy-dev at lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
John Brunelle wrote:
Hello,
I would love to be able to have all datasets, job output, etc. go to a different directory for each user. Some of our cluster users have access to much more storage than others, and it would be nice if they could take advantage of that storage through galaxy, too.
I was thinking of inserting an extra level of directory structure below galaxy's database directory so that, for example, instead of just database/files there would be database/user1/files, database/user2/files, and so on (and likewise for job_working_directory, pbs, etc.). That way we could make database/userX a symbolic link to a different filesystem with higher quotas when appropriate.
I was looking in the code for where the external_filename attribute you mention is handled, and I see a couple key places where this can be hacked in -- JobExternalOutputMetadataWrapper.setup_external_metadata (in each place where I see it called the username could be passed along) and in JobWrapper.__init__, where it sets the job_working_directory (it has the job, so it has the username).
However, it seems that's too late -- I see galaxy tring to create things in the normal database locations before any of these are called and the output paths adjusted. It would be nice if this could be coded into the Dataset class itself, but I don't know how to get the appropriate username there...
Any suggestions for how to hack something together to make this work?
Hi John, Most datasets are created in lib/galaxy/tools/actions/__init__.py. The exception is for the upload tool, which is in upload_common.py in the same directory. After creating the dataset instance (usually a HistoryDatasetAssociation, you'd need to set the filename with the instance's set_file_name() method. --nate
Thanks a lot,
John
On Thu, 24 Jun 2010, Nate Coraor wrote:
Hi;
Is it poosible to upload files to the different local directories by Galaxy upload.py script? It seems to upload all files to the same directory based on the file path in universe_wsgi.ini file.
if yes, what part of upload.py needs to be edited? also is it >
amenda lee wrote: possible to keep name of uploaded file unchanged?
Hi Amenda,
Galaxy manages all of its data internally. If it was not done like this, I am not sure how you would define which directory each upload should go into.
The exception to this is that when using Galaxy Data Libraries, you can choose to leave files in their original locations when uploading them on the server side (via the "Upload a directory of files" and "Upload from filesystem paths" options).
That said, if you wanted to write some sort of custom method for placing files based on some rules at upload, it could be done. Dataset objects have an 'external_filename' attribute which can be set to any filesystem path. This value is stored in the database and will be automatically used by anything which needs access to the disk file.
--nate
Thanks in advance.
Amenda Lee
_______________________________________________ galaxy-dev mailing list galaxy-dev at lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
participants (2)
-
John Brunelle
-
Nate Coraor