Re: [galaxy-dev] datacache & bowtie2 for mm9 ?

20 Sep 2013

      Hello Curtis,

The datacache was originally pointed to the data staging area and is now 
pointed to the data published area. The difference is that the published 
area contains data and location (.loc) files that are in synch and have 
completed final testing. It is your choice about whether to use the 
staged-only data - it depends how risk tolerant your project is and if 
you plan on testing. But, that said, I think it is almost certainly fine 
or our team wouldn't have staged it yet. A vanishingly small number of 
datasets are pulled back once they make it to staging, and this is why 
we were comfortable pointing datacache there in the first place (were 
unable to point to the published area at first, but wanted to make the 
data available ASAP).

Going forward - I can let you know that these indexes are very easy to 
create: one command-line execution, then add one line to the associated 
.loc file. Instructions are here, see "Bowtie and Tophat":
http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup

For one or few genomes, not a problem. For hundreds of genomes with 
variants, can become tedious even with helper tools and in our case, the 
processing interacted with disk that was undergoing changes (as we have 
been working on system configuration most of the summer). Also, with the 
Data Manager is now available, creating batch indexes for use via rsync 
become lower priority. Even so, I would expect more indexes to be fully 
published once the final configuration is in place, as many are already 
staged or close being staged (watch the yellow banner on Main).

Hopefully this helps to explain the data, guides you to making an 
informed decision, and aids with creating your own indexes as needed,

Thanks!
Jen
Galaxy team

On 9/18/13 1:04 PM, Curtis Hendrickson (Campus) wrote:
...
Folks,
First, I wanted to thank you for making the datacache available 
(http://wiki.galaxyproject.org/Admin/Data%20Integration; 
rsync://datacache.g2.bx.psu.edu). It's a great resource.
However, what is the best way to stay abreast of changes to what's in 
datacache, and understand how these indexes are computed?
We are currently upgrading to bowtie2, but I notice that the bowtie2 
indices for mm9, which used to be in
rsync://datacache.g2.bx.psu.edu/indexes/mm9/mm9*/bowtie2_index
have been removed, and only the hg19 genome has bowtie2 indices. Why 
only that one, and not the others?
Where are the scripts you use to make these indices, in case I want to 
create bowtie2 indices for other
So, how do I find out **why** they were removed? (Can I safely use the 
copy I have, or was there a problem with them?)
More generally, how do I understand the policies and logic behind the 
datacache indices, and be notified of changes, short of running my own 
periodic rsync/diff?
Finally, since I'm doing "reproducible research" is anything planned 
for systematically versioning genome indices, so I can easily tell 
what version of a system (ie, what BWA version) was used to create the 
index, and be sure that an index will not suddenly disappear.
Thanks,
Curtis
Research Associate/CTSA-Informatics Team
University of Alabama at Birmingham
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
-- 
Jennifer Hillman-Jackson
http://galaxyproject.org

Re: [galaxy-dev] datacache & bowtie2 for mm9 ?

Jennifer Jackson