Hi Jill, I am pretty certain that I found out why mm7 is not extracting - the database is not fully set up to use with this tool (although the data is present). I'll add this to the list of items to adjust this upcoming month (plus find/fix any others like it - all would be older DBs).
And glad the tab file is now working. Whenever you really do have just a tabular file - using a plain text editor is best along with the option on the 'Get Data -> Upload File' form of ' Convert spaces to tabs:'. Excel is known to most bioinformatics folks as a tool that it is wise to carefully screen any "text" output from - primarily because of inserted 'hidden' or whitespace characters (soft returns and such). Not Excel's fault, nor any other editor's - but what you did (cycle through a plain text editor) is one way gain clear data.
Now, that said -> never use that upload option on any file that would contain internal spaces - such as GFF/GTF, or SAM, but for plain text tabular, in particular strict BED, this can help clean up stray spaces or tabs introduced. Other tools in Text manipulation can also help for data already loaded (try cutting out the columns you want to use, maybe after converting all whitespace to tabs first).
Thanks and glad you have a working solution. I missed the details of the mm7 extract issue originally - sorry if that was confusing!
Jen Galaxy team
On 10/31/13 6:46 AM, Kreiling, Jill wrote:
Thank you Jen. You mentioned it may be a formatting problem and you were able to successfully convert the coordinates to mm8. I tried that several times yesterday and they kept coming up in the unmapped file saying the region was deleted from the newer build. I opened the tab deliminated text file I created in Excel in Notepad++ and just resaved it without changing anything. When I uploaded the new file to galaxy and and lifted over to mm8 it worked fine. It still wouldn't pull out genomic sequences from mm7, but it will from the new file converted to mm8. Thank you for your help - it is very much appreciated!
On Wed, Oct 30, 2013 at 11:45 PM, Jennifer Jackson <firstname.lastname@example.org mailto:email@example.com> wrote:
Hello Jill, This is strange. I just pasted the region you noted below into Galaxy (in the 'Get Data -> Upload File' tool), assigned it to mm7, and lifted to mm8 without any issues. I also checked the data behind the tool - all appears to be fine. result in mm8 coordinates chr1 4552557 4556399 region_0 0 + Are you certain there is not a format problem with the data? This seems to be the only explanation for the problem. But after one more check, you can submit a bug report and note that this is the problem. Be sure to leave the input and all error outputs undeleted when you report the problem or we won't be able to offer the best feedback. It is true that UCSC only produced a liftOver file that went from mm7->mm6/8, then you can go from mm8->mm7/9/10. This is just the data available. When lifting from data this old - be aware that a genome can change quite a bit in some regions in new 3 revisions. Still, lifting this way is certainly something you can try. If a much older genome is not in Galaxy, just do the lift at UCSC (the liftOver tool is under the top blue banner "Tools"). Hopefully the problem can be sorted out but if not we can take a look, Jen Galaxy team On 10/30/13 3:04 PM, Kreiling, Jill wrote:
Hello, I have a set of coordinates for mm7 that I have been using try to extract the genomic sequences. However it doesn't recognize the chromosome name column. The are currently listed as chr1, chr2, ....chrX. This is the error I get each time I try to extract sequences: Chromosome by name 'chr1' was not found for build 'mm7'. Skipped 1181 invalid lines, 1st is #1, "chr1 4558068 4561910 region_0 0 +" However if I change the build to mm10 it works fine - but the coordinates are not the same between builds. Also, mm7 can't be lifted over to mm9 or mm10. Does anyone know the proper format for chromosome name in mm7: Thanks, Jill ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server atusegalaxy.org <http://usegalaxy.org>. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Jennifer Hillman-Jackson http://galaxyproject.org
-- Jill Kreiling, Ph.D. Assistant Professor, Research Department of Molecular Biology, Cell Biology and Biochemistry Brown University Providence, RI 02903