Hi Peter, I managed to get the microbial data compressed and it is available along with the .loc file here http://www.bx.psu.edu/~dan/microbes/test_only/all/. The mention of using Libraries instead of a tool was primarily due to the static content of what is being made available by the tool. Currently the tool will create a new file on disk each time a user requests a dataset, with libraries each user would have a pointer to the same file on disk. I had been thinking of arranging the libraries similar to how they are in the tool, but perhaps with additional sub-categories; although as the number of genomes (greatly) increases I'm not entirely certain how the interface would scale. Maybe some UI enhancements to the libraries could make it more manageable. Libraries also have the added benefit of allowing some versioning of datasets. There does seem to be some interest in this tool/data, so after the new year I will try to find some time take another pass through it. It appears that you've looked through it quite a bit, so I'll definitely be using your recent efforts/notes to help with this (if you haven't gotten it all fixed and working by then;) ). Thanks, Dan On Dec 22, 2010, at 11:02 AM, Peter wrote:
On Wed, Dec 22, 2010 at 3:49 PM, Brad Chapman <chapmanb@50mail.com> wrote:
Peter;
I've started reading about Data Libraries on the wiki: https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/Libraries https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/UploadingFile... https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/Tutorial/Data...
Are there any nice examples of tools/scripts which populate Galaxy Data Libraries automatically which you think it would be helpful to read?
You can use the API for this. Here's a script that build data libraries for next gen sequencing runs:
https://github.com/chapmanb/bcbb/blob/master/nextgen/scripts/upload_to_galax...
It selects files of interest, organizes them into a local directory structure, and then uploads them to Galaxy. Folders are created via the API, and this all uses a thin wrapper:
https://github.com/chapmanb/bcbb/blob/master/nextgen/bcbio/galaxy/api.py
Brad
That looks very handy Brad - thank you :)
What I'm not clear on yet is how to structure the libraries - in particular can I associate a genome with a library, or with each file in a library?
If I go with one library per bacteria/archaea, how well would Galaxy cope with 800+ libraries?
If I go with one *big* library for all the NCBI RefSeq bacteria/archaea, using a folder structure inside the library, how easy will it be for the user to find a particular genome.
[We'd probably want to extend this to other NCBI RefSeq genomes later, e.g. plants, fungi and some animals]
I guess I'll have to experiment, but I imagine Dan has thought about this already and may have some advice.
Cheers,
Peter _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev