Dear Mark, Thank you very much for your e-mail. This is exactly the kind of feedback we are looking for to make Galaxy better serve your needs. We will take a hard look at these problems. Below a quick comment on gff:
(1) First of all, I try to load a gff file - that works fine. But the "edit attributes" won't correctly assign the columns to the data for doing interval stuff - I try to do an overlap, and it says something about startcol being undefined.
Currently we treat gff at simple tab delimited file as this is the easiest way to deal with its o-1 incompatibility with bed format. To use bed in interval operations use GFF-to-BED converted (test.g2.bx.psu.edu -> Convert Formats)\ Thanks, anton Anton Nekrutenko Assistant Professor Department of Biochemistry and Molecular Biology Center for Comparative Genomics and Bioinformatics 505 Wartik Building PennState University University Park, PA 16802 814 865-4752 814 863-6699 FAX anton@bx.psu.edu http://www.bx.psu.edu/~anton http://g2.bx.psu.edu On Apr 11, 2006, at 7:55 PM, Mark Bieda wrote:
Hello All, I'm a member of the ENCODE TR group and have done a fair amount of programming. Galaxy seems a well-designed approach to analysis. But there are a number of problems, it seems, in my basic testing. (1) First of all, I try to load a gff file - that works fine. But the "edit attributes" won't correctly assign the columns to the data for doing interval stuff - I try to do an overlap, and it says something about startcol being undefined. (2) Generally, I strongly advise that you allow direct loading of gff data and recognition of this format. It's easy to do. (But this is lower priority, I understand). (3) So I create my own interval files as a test - one file is a subset of the other, with a small difference in the sizes of some intervals (with strand information, FYI). I compute the overlap and the difference. The overlap is correct, the difference looks probably ok. I then do the union of the overlap and the difference. This should lead to my original data - but no, it doesn't. (4) I mention the strand information because it seems that the difference eliminates this info (bizarrely) and the overlap keeps it. (5) As a general comment, I would say that I am quite used to bioinformatics and programming. If I am having these sorts of problems, this will be very hard on more experimentally oriented biologists.
Ok, so I'm writing this because, like I said, Galaxy looks like a very well-thought out approach to doing this stuff - I'm impressed with the overall project approach - but I think that it doesn't seem to be working very well right now - or you are over my head.
Say hello to Ross Hardison for me, and I look forward to hearing from you -
Also, I've attached files for your testing Mark
Mark Bieda, Ph.D. UC-Davis Genome Center Postdoctoral Fellow Farnham Lab <largerset_hg17.txt> <smallerset_hg17.txt> _______________________________________________ Galaxy-user mailing list Galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user