Questions regarding Circster visualization
Hi, I just noticed that Circos plots are now incorporated into Galaxy's visualization methods which is awesome! However, I'm a bit lost as to what kinds of data I can load into Circos and I cannot find much documentation, so I hope you guys can help out. 1. I tested it using a bigWig and a BED file. Both were loaded nicely in Circos, but I was surprised to see that the visualization of both files looked exactly the same, i.e. both file types seemed to be interpreted as histograms/coverage data. From the Circos plots I've seen in publications, I assumed that BED files should be visualized as straight lines, indicating genome regions (rather than a coverage). Am I doing anything wrong? Or, rather, how should I modify the BED file so that its content is simply interpreted as genomic regions? 2. In the Galaxy publication (www.biomedcentral.com/1471-2164/14/397), "line data" is mentioned for displaying connecting lines in the center of the circle - could you give me an example line of how this kind of data needs to be formated? It would be great to make much more use of Circster! Thanks a lot! Best wishes, Friederike
1. I tested it using a bigWig and a BED file. Both were loaded nicely in Circos, but I was surprised to see that the visualization of both files looked exactly the same, i.e. both file types seemed to be interpreted as histograms/coverage data. From the Circos plots I've seen in publications, I assumed that BED files should be visualized as straight lines, indicating genome regions (rather than a coverage). Am I doing anything wrong? Or, rather, how should I modify the BED file so that its content is simply interpreted as genomic regions?
This is a limitation of the visualization, and it should be addressed. I've created a Trello card for this enhancement that you follow here: https://trello.com/c/YIdx6QvV
2. In the Galaxy publication (www.biomedcentral.com/1471-2164/14/397), "line data" is mentioned for displaying connecting lines in the center of the circle - could you give me an example line of how this kind of data needs to be formatted?
The format is a 7-column tabular file with tab-separated values: -- chrom1 start1 end1 chrom2 start2 end2 score -- Score isn't used right now, but it still needs to be there. Once you have this format, you'll need to convert the datatype from 'tabular' to 'chrint' in order to visualize it (click on the pencil icon --> Datatype. Also, I have a workflow up to convert Tophat fusion output data to chrint format here: https://usegalaxy.org/u/jeremy/w/tophat-fusion-post-output-to-chrint Sorry for the cryptic nature of everything right now. We'll get this info and more up on a wiki page eventually (you're welcome to start one in the meantime). Let us know if you have more questions. Best, J.
Dear Jeremy, thanks for the reply! Indeed, there's another feature I don't fully understand: I have a bgiWig file that contains reads of only one chromosome. I expected that Circster would display this one chromosome as one circle, but apparently Circster always draws a circle where all possible chromosomes of a genome are displayed. I think the usability would greatly increase if Circster only displayed those chromosomes that are actually represented in the coverage file. (Of course, I could zoom in, but if you're working with a chromosome that's very small in comparison (e.g. the Y chromosome) the circular representation is not really seen anymore as the region covered by the Y chromosome is so tiny compared to the autosomes). I hope this makes sense, let me know if there's already a solution for that and I was just too blind to notice it! Thanks a lot! Best, Friederike 2014-02-11 17:49 GMT+01:00 Jeremy Goecks <jgoecks@email.gwu.edu>:
1. I tested it using a bigWig and a BED file. Both were loaded nicely in Circos, but I was surprised to see that the visualization of both files looked exactly the same, i.e. both file types seemed to be interpreted as histograms/coverage data. From the Circos plots I've seen in publications, I assumed that BED files should be visualized as straight lines, indicating genome regions (rather than a coverage). Am I doing anything wrong? Or, rather, how should I modify the BED file so that its content is simply interpreted as genomic regions?
This is a limitation of the visualization, and it should be addressed. I've created a Trello card for this enhancement that you follow here: https://trello.com/c/YIdx6QvV
2. In the Galaxy publication (www.biomedcentral.com/1471-2164/14/397), "line data" is mentioned for displaying connecting lines in the center of the circle - could you give me an example line of how this kind of data needs to be formatted?
The format is a 7-column tabular file with tab-separated values:
-- chrom1 start1 end1 chrom2 start2 end2 score --
Score isn't used right now, but it still needs to be there. Once you have this format, you'll need to convert the datatype from 'tabular' to 'chrint' in order to visualize it (click on the pencil icon --> Datatype. Also, I have a workflow up to convert Tophat fusion output data to chrint format here:
https://usegalaxy.org/u/jeremy/w/tophat-fusion-post-output-to-chrint
Sorry for the cryptic nature of everything right now. We'll get this info and more up on a wiki page eventually (you're welcome to start one in the meantime). Let us know if you have more questions.
Best, J.
Indeed, there's another feature I don't fully understand: I have a bgiWig file that contains reads of only one chromosome. I expected that Circster would display this one chromosome as one circle, but apparently Circster always draws a circle where all possible chromosomes of a genome are displayed. I think the usability would greatly increase if Circster only displayed those chromosomes that are actually represented in the coverage file. (Of course, I could zoom in, but if you're working with a chromosome that's very small in comparison (e.g. the Y chromosome) the circular representation is not really seen anymore as the region covered by the Y chromosome is so tiny compared to the autosomes).
Circster is really for genome-wide visualization, and the assumption is that you'll have data for many if not all chromosomes. If you have data for only a single chromosome, using Trackster (Galaxy's track browser) makes more sense; Trackster is also more developed and has more display options right now. Let's say, then, that what you're proposing is a very advanced feature that could be implemented down the road. Best, J.
participants (2)
-
Friederike Dündar
-
Jeremy Goecks