Hello,

Please forgive the length of this proposition as I try to explain my reasoning behind this. Let me say first of all that I understand that Galaxy is not meant to be everything to everyone and that requests for features may not suit everyone who uses Galaxy. That being said I have an idea or request that I think would be convenient for dealing with user's datasets from a file-system perspective.

Galaxy has the obvious benefit and advantage (compared to manual job-submission for tools on a cluster) of providing an interface for using all the analysis tools, and the history of the operations done on your data, all in one place. However I have found that putting all the output & datasets in one directory (the files/000/ directory) on the file-system causes a problem for the users if they specifically want to interact with it *on the file-system*, and not just through the Web interface - for whatever complicated or diverse reasons.

Since Galaxy runs on a cluster of its own in our environment, and we do not allow users to remote connect into it to submit manual jobs (and individually output it to their separate home directories) like we do our main cluster, it is essentially a black box beyond the GUI interface of Galaxy. That is essentially what we want except for how they can interact with the output files.

The issue is that our users would like an easy means of copying their files off of the Galaxy cluster to other servers from a command line (possibly even automated by scripts). Even if we allow an FTP share of the output directory for users to do that, the common [galaxy-dist]/database/files/000/ directory clumps all of the files for all users together in one directory and uses a sequential file-naming scheme (dataset_N++) that is not easy to discriminate between as to who the owner is for each file.

Is there a way that the dataset output directory locations could be designed (or set optionally?) like the FTP upload feature's expected directory structure: where the files are dropped into the corresponding subdirectory of the user who produced it? For example having under database/files/ subdirectories named according to the user's Galaxy account id (like [galaxy-dist]/database/files/jsmith, [galaxy-dist]/database/files/sparker, etc.). If they could be segregated by user it would be much easier to keep track of what datasets belong to whom on the file-system. Then I could possibly set up a read-only FTP share to the files/ directory on the cluster, from which the users could directly copy the files in their personal subdirectory to other systems, and perhaps batch download them, rather than having to rely solely on the Web interface.

I understand that the way Galaxy is currently designed is that the files are just generically named (the "behind-the-scenes" handling of data is a black box) and it is the database that keeps track of which files belong to whom, and which has the metadata for more meaningful dataset/job names, etc. But a file-system hierarchy alternative would also be welcome in a heavily command-line oriented computational environment too.

Would setting up a more user-representative output directory hierarchy on the file-system like that be possible?

Best Regards,
Josh Nielsen