
On Wed, Dec 22, 2010 at 3:49 PM, Brad Chapman <chapmanb@50mail.com> wrote:
Peter;
I've started reading about Data Libraries on the wiki: https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/Libraries https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/UploadingFile... https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/Tutorial/Data...
Are there any nice examples of tools/scripts which populate Galaxy Data Libraries automatically which you think it would be helpful to read?
You can use the API for this. Here's a script that build data libraries for next gen sequencing runs:
https://github.com/chapmanb/bcbb/blob/master/nextgen/scripts/upload_to_galax...
It selects files of interest, organizes them into a local directory structure, and then uploads them to Galaxy. Folders are created via the API, and this all uses a thin wrapper:
https://github.com/chapmanb/bcbb/blob/master/nextgen/bcbio/galaxy/api.py
Brad
That looks very handy Brad - thank you :) What I'm not clear on yet is how to structure the libraries - in particular can I associate a genome with a library, or with each file in a library? If I go with one library per bacteria/archaea, how well would Galaxy cope with 800+ libraries? If I go with one *big* library for all the NCBI RefSeq bacteria/archaea, using a folder structure inside the library, how easy will it be for the user to find a particular genome. [We'd probably want to extend this to other NCBI RefSeq genomes later, e.g. plants, fungi and some animals] I guess I'll have to experiment, but I imagine Dan has thought about this already and may have some advice. Cheers, Peter