I am seeing some odd behavior with BAM file import from FTP staging directory where ownership of imported file is retained instead of being changed to galaxy system user account.
I should note that our site is not using galaxy specific FTP server configuration, but instead we have configured a directory namespace where cluster users can drop-off files using scp, sftp or even fetch using wget. The galaxy system account can access these files and hence it can import them into it's datasets staging directory using 'FTP' upload mechanism. We have been using this configuration for about a year now with ~ 15,000 files deposited using it.
There have been few instances (22) where file ownership didn't change to galaxy system account after a file was imported into the galaxy. Normally when a file is imported from FTP directory to Galaxy's datasets staging directory, it's inode entry and ownership is changed. In these twenty-two cases, which appear to be specific with BAM files, the inode entry and ownership didn't change, but the file name is in galaxy's datase_nnnnn.dat format and file contents/size are legit. So it seems for 'successful' import operations the file is copied to galaxy space (ownership and inode changed) and for 'unsuccessful' import operations file is moved to galaxy space.
I was wondering if someone could help me in understanding the upload tool operation, so that I can get better sense of underlying file ownership issues. I have seen it happening for a file where it had exactly same permissions/ACLs, as the file which was successfully imported. So are there any other factors that might be preventing file ownership/inode changes? Does galaxy have a different import mechanisms according to file type?
We are using galaxy-dist revision 40f1816d6857 (will be updated soon!). This issue has happened even before this dist version was in place. Any pointers for debugging will be really helpful.
--
Shantanu