
On Tue, Dec 21, 2010 at 1:44 PM, Peter <peter@maubp.freeserve.co.uk> wrote:
On Mon, Dec 20, 2010 at 9:52 PM, Peter <peter@maubp.freeserve.co.uk> wrote:
On Mon, Dec 20, 2010 at 9:42 PM, Daniel Blankenberg <dan@bx.psu.edu> wrote:
Hi Peter,
As Alex has pointed out, the scripts that were used to create this data are available under scripts/microbes/ and there is a README.txt file available there as well.
Thank you both for pointing that out - I hadn't found it yet. Adding a mention of this to the microbial_data.loc.sample file would have helped ;)
However, it has been some time since these scripts have been used and they have become stale. They would require some real amount of tweaking to get working properly again (there was some messy webpage scraping against the NCBI microbial genomes project page involved), but we don't have the resources or plan to do this now.
Oh. That's a shame - but a little tweaking I'm willing to attempt.
I immediately identified a small tweak required, ... I've make a branch here, so far just a few commits ... [I'm still testing things for now]: https://bitbucket.org/peterjc/galaxy-central/src/microbes
Hi Dan, I have another question - why does harvest_bacteria.py etc use project IDs as the folder names (numbers) rather than using the same names as the NCBI (species names with underscores, plus in recent months a suffix of the uid)? If you have opted to match the NCBI tree, then it would be easy to fetch all the GenBank files, all the GeneMark files etc using the provided tar balls: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/all.gbk.tar.gz ftp://ftp.ncbi.nih.gov/genomes/Bacteria/all.GeneMark.tar.gz etc I've started reading about Data Libraries on the wiki: https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/Libraries https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/UploadingFile... https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/Tutorial/Data... Are there any nice examples of tools/scripts which populate Galaxy Data Libraries automatically which you think it would be helpful to read? Peter