There have been no public data updates since the migration started (late
last spring we froze the data). But there are some known issues and data
that is ready to be released, in the process of becoming ready, etc. We
expect to be able to start working on this again in the very near term.
To start this off, the mm9 bowtie2 indexes were restored this morning:
And these other finds are great, thanks Bjoern. I will add them to the
public card. A few errors that were corrected later last spring popped
out again, but will also be fixed. Small, but will be addressed ASAP.
Adding in any missing .2bit files in general are on our internal to-do
list. Older genomes have other inconsistencies that will be addressed.
The goal is to have the data filled in with complete indexes around the
end of Nov, then filled in with newly released genomes/more variants for
important model organisms by the end of the year. All of this depends on
various factors, but this is where we are shooting for.
Thanks and if anything else off is noted, please feel free to send to
the list or add to the card. All input is welcome - we want this to be a
great resource for everyone - time to get back to making that happen now
that the migration is wrapping up!
On 11/8/13 2:00 AM, Bjoern Gruening wrote:
to chime into this discussion.
I found some inconsistency during my rsync endeavor and I'm curious if
there is any way to contribute to that service.
xenTro3 xenTro3 Frog (Xenopus tropicalis): xenTro3
ce6 /data/0/ref_genomes/ce6/ce6.2bit is missing from twobit.loc
ce6 has no .fa file under seq/ but in allfasta.loc there is a
reference to it ce6 Caenorhabditis elegans: ce6
TAIR9 and TAIR10 is not available via rync
Bowtie2 indices are missing for ce6, xentTro3
> Hi Jennifer,
> Today I was trying to pull some bowtie2 indices from Galaxy rsync server for PhiX to
run some tests and just got the ones for bowtie1… I'm wondering what's the state
in regards to this past thread and what we can do to help in here.
> 7 mar 2013 kl. 20:01 skrev Jennifer Jackson <jen(a)bx.psu.edu
> > Hi Brad (and Roman),
> > The team has talked about this in detail. There are a few wrinkles with just
pulling in indexes - Dan is doing some work that could change this later on, but for now,
the rsync will continue to point to the same location as Main's genome data source.
This means that there are some limits on what we can do immediately. Setting up a
submission pipe is one of them - there just isn't resource to do this right now or a
common place distinct from Main to house the data. A few other ideas came up - we can chat
later, each had side issues.
> > But I saw your tweet and think that it is great that you are pulling
CloudBioLinux data from the rsync now, so let's get as much data in common as
possible, so you have data to work with near term.
> > I am in the process of adding bt2 indexes - some are published to Main/rsync
server already and some are not, but more will show up over the next week or so (along
with more genomes and other indexes). I'll take a look at what you have and pull/match
what I can. Genome sort order and variants are my concerns, both require special handling
in processing and .locs. If it takes longer to check, I am just going to create here if I
haven't already. The GATK-sort hg19 canonical is already on my list - it needed all
indexes, not just bw2. When the next distribution goes out, I'll list what is new on
the rsync in the News Brief.
> > For the Novoalign indexes, I'm not quite sure what to do about those yet. Or
for any indexes associated with tools or genomes not hosted on Main. Do you want to open a
card for those and any other cases that are similar? We can discuss a strategy from there,
maybe at IUC, if Greg/Dan thinks it is appropriate. Please add me so I can follow.
> > I'll be in touch as I go through the data. Thanks for your patience on
> > Jen
> > Galaxy team
> > On 2/21/13 12:43 PM, Brad Chapman wrote:
> >> Hi all;
> >> Is there a way for community members to contribute indexes to the rsync
> >> server? This resource is awesome and I'm working on migrating the
> >> CloudBioLinux retrieval scripts to use this instead of the custom S3
> >> buckets we'd set up previously:
> >> It's great to have this as a public shared resource and I'd like to
> >> able to contribute back. From an initial pass, here are the things I'd
> >> like to do:
> >> - Include bowtie2 indexes for more genomes.
> >> - Include novoalign indexes for a number of commonly used genomes.
> >> - Clean up hg19 to include a full canonically sorted hg19, with indexes.
> >> Broad has a nice version prepped so GATK will be happy with it, and
> >> you need to stick with this ordering if you're ever going to use a
> >> GATK tool on it. Right now there is a partial hg19canon (without the
> >> random/haplotype chromosomes) and the structure is a bit complex.
> >> What's the best way to contribute these? Right now I have a lot of the
> >> indexes on S3. For instance, the hg19 indexes are here:
> >> I'm happy to format these differently or upload somewhere that would
> >> make it easy to include. Thanks again for setting this up, I'm looking
> >> forward to working off a shared repository of data,
> >> Brad
> >> ___________________________________________________________
> >> Please keep all replies on the list by using "reply all"
> >> in your mail client. To manage your subscriptions to this
> >> and other Galaxy lists, please use the interface at:
> > --
> > Jennifer Hillman-Jackson
> > Galaxy Support and Training
> > ___________________________________________________________
> > Please keep all replies on the list by using "reply all"
> > in your mail client. To manage your subscriptions to this
> > and other Galaxy lists, please use the interface at:
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> To search Galaxy mailing lists use the unified search at: