galaxy fetch sequences question
Dear Galaxy expert: I have recently been using Galaxy and have been thrilled with its utility and ease. Thank you very much! I have one question - I am trying to fetch sequences (for C. elegans ce4 genome) for intervals in a tab delimited file that has been used successfully as an interval file for other types of Galaxy queries (attached). However, when I try to fish out the sequences associated with these intervals (for MEME analysis), I get empty returns, and the same warning (below). Where is it getting the 544 from? What do I need to do to make it read the file correctly? I tried file conversion, getting rid of the extra columns, and changing the chromosomes to I, II, III, etc (the canonical names in the worm field), and nothing seems to do the trick. Any suggestions would be helpful. Thanks! Valerie Warning message: empty format: fasta, database: ce4 Info: 2720 warnings, 1st is: Unable to fetch the sequence from '3525' to '544' for build 'ce4'. Skipped 2720 invalid lines, 1st is #1, "Chr1 3525 4069 281 21 3.25E-58" ------------------------------------------------ Valerie Reinke Associate Professor Dept Genetics Yale University School of Medicine 203-785-5228 valerie.reinke@yale.edu
Hi Valerie, To confirm, you are using "Fetch Sequences -> Extract Genomic DNA" with the "Locally cashed" ce4 genome? If so, you may be able to solve this issue by correcting the chromosome names. They have to match exactly with what is in the native genome. Specifically, error message your shared, changing "Chr1" to be "chrI" will probably fix the issue. To see what the chromosome names are for ce4, please see this history. http://main.g2.bx.psu.edu/u/jen-bx-galaxy-edu/h/ce4-chrominfo Since the genome came from UCSC, I was able to pull a special table from the Table browser, called "chromInfo", that lists out the names in an easy format. The scientific notation may also be an issue with some tools, but unlikely with this particular operation. Please let us know if you need more help. Sharing a link to your history with the problem input/output identified would be helpful if we need to look at this in more detail (Use "Options -> Share or Publish" and email back the link, noting the problem datasets.) Best, Jen Galaxy team On 2/18/11 9:07 AM, Valerie Reinke wrote:
Dear Galaxy expert:
I have recently been using Galaxy and have been thrilled with its utility and ease. Thank you very much! I have one question - I am trying to fetch sequences (for C. elegans ce4 genome) for intervals in a tab delimited file that has been used successfully as an interval file for other types of Galaxy queries (attached). However, when I try to fish out the sequences associated with these intervals (for MEME analysis), I get empty returns, and the same warning (below). Where is it getting the 544 from? What do I need to do to make it read the file correctly? I tried file conversion, getting rid of the extra columns, and changing the chromosomes to I, II, III, etc (the canonical names in the worm field), and nothing seems to do the trick. Any suggestions would be helpful.
Thanks!
Valerie
Warning message:
empty format: fasta, database: ce4
Info: 2720 warnings, 1st is: Unable to fetch the sequence from '3525' to '544' for build 'ce4'. Skipped 2720 invalid lines, 1st is #1, "Chr1 3525 4069 281 21 3.25E-58"
------------------------------------------------
Valerie Reinke
Associate Professor
Dept Genetics
Yale University School of Medicine
203-785-5228
valerie.reinke@yale.edu
_______________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org
participants (2)
-
Jennifer Jackson
-
Valerie Reinke