On Mon, Dec 20, 2010 at 9:42 PM, Daniel Blankenberg <dan@bx.psu.edu> wrote:
Hi Peter,
As Alex has pointed out, the scripts that were used to create this data are available under scripts/microbes/ and there is a README.txt file available there as well.
Thank you both for pointing that out - I hadn't found it yet. Adding a mention of this to the microbial_data.loc.sample file would have helped ;)
However, it has been some time since these scripts have been used and they have become stale. They would require some real amount of tweaking to get working properly again (there was some messy webpage scraping against the NCBI microbial genomes project page involved), but we don't have the resources or plan to do this now.
Oh. That's a shame - but a little tweaking I'm willing to attempt.
At this point we are moving towards removing this tool from our main server (it has already been removed from tool_conf.xml.main), but would be more than willing to reincorporate a working version of the retrieval and parsing scripts. However, this tool predates Library functionality, which is much better suited for providing access to static precached datasets. I can take a look into assembling a compressed file which contains the data and location file currently used by the main server if you are interested, but it should be noted that the tool itself has developed some quirks introduced by some unknown changesets that prevent it from work entirely properly (e.g. selecting multiple datasets at once).
I don't know much about the library functionality yet (pointers to docs welcome), but this could be useful to us so I'll try to make time to look at it. A copy of the current live microbial_data.loc file would be very helpful, along with a set of data files for one or maybe two organisms (e.g. NC_000913, E. coli K12, and NC_005213, Nanoarchaeum equitans, small but an interesting test case as it has a gene spanning the origin) Thanks, Peter