Re: [galaxy-dev] Setting up microbial_data.loc given a mirror of NCBI FTP site

22 Dec 2010

      On Wed, Dec 22, 2010 at 3:49 PM, Brad Chapman <chapmanb@50mail.com> wrote:
...
Peter;
...
I've started reading about Data Libraries on the wiki:
https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/Libraries
https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/UploadingFile...
https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/Tutorial/Data...
Are there any nice examples of tools/scripts which populate Galaxy
Data Libraries automatically which you think it would be helpful to read?
You can use the API for this. Here's a script that build data
libraries for next gen sequencing runs:
https://github.com/chapmanb/bcbb/blob/master/nextgen/scripts/upload_to_galax...
It selects files of interest, organizes them into a local directory
structure, and then uploads them to Galaxy. Folders are created via
the API, and this all uses a thin wrapper:
https://github.com/chapmanb/bcbb/blob/master/nextgen/bcbio/galaxy/api.py
Brad
That looks very handy Brad - thank you :)

What I'm not clear on yet is how to structure the libraries - in particular
can I associate a genome with a library, or with each file in a library?

If I go with one library per bacteria/archaea, how well would Galaxy
cope with 800+ libraries?

If I go with one *big* library for all the NCBI RefSeq bacteria/archaea,
using a folder structure inside the library, how easy will it be for the
user to find a particular genome.

[We'd probably want to extend this to other NCBI RefSeq genomes
later, e.g. plants, fungi and some animals]

I guess I'll have to experiment, but I imagine Dan has thought about
this already and may have some advice.

Cheers,

Peter