Dear Galaxy,
I've been searching the lists for this type of issue and only found one solution thus far which is the use of a custom reference. It doesn't make sense in my situation because the reference I used was from Galaxy itself when I mapped my data. I'm now trying to use GATK to find SNPs but no matter what I've tried I can't get past this issue. I'm trying to use the Count Covariates and the Unified Genotyper but to no avail. The only issue appears to be that "Sequences are not currently available for the specified build."
Any help would be much appreciated. Thanks
Sincerely, Rich
Hi Rich,
Additional genomes will be specially sorted, indexed, and added to the GATK tool suite as it moves out of beta status. Hg19 is short-listed for addition near term.
We do take requests to have genome added to tools and consider these when ranking our prioritization lists. Which genome did you want to use?
One small warning when using a custom reference genome with this particular tool set - be sure to visit the GATK web site links directly to understand the sorting criteria for genomes. It can be different than how Galaxy, UCSC, and many of the existing tools already sort or instruct users to sort genomes or data. In short, the genome must be sorted in the exact order that it was originally released, but even this can be slightly confusing, especially if working with a non-human genome as there are few examples. Still, the documentation can help and tools are easily tested (if the sorting is wrong, the tool will fail and let you know).
If others have requests for GATK native genomes, they are also welcome to reply. In general, key model organisms would be ranked highest in priority. We also try to get the largest genomes loaded natively first (for purely practical reasons).
Good question, thanks!
Jen Galaxy team
On 6/5/12 8:01 AM, Richard Linchangco wrote:
Dear Galaxy,
I've been searching the lists for this type of issue and only found one solution thus far which is the use of a custom reference. It doesn't make sense in my situation because the reference I used was from Galaxy itself when I mapped my data. I'm now trying to use GATK to find SNPs but no matter what I've tried I can't get past this issue. I'm trying to use the Count Covariates and the Unified Genotyper but to no avail. The only issue appears to be that "Sequences are not currently available for the specified build."
Any help would be much appreciated. Thanks
Sincerely, Rich
galaxy-user@lists.galaxyproject.org