Genome Browser Histogram Visualization of Accepted Hits
Hello, I am attempting to run accepted hit data from Tophat output into the UCSC Genome Web Browser for visualization of sequencing hits in specific genes. However, the BAM files yield tiles and are too large to present through the browser. Is there a better file format to convert to that would allow better visualization such as histograms?
From word of mouth, I have been told to convert BAMs to BEDs and put BED files through the browser. However, I notice that Galaxy does not have an option for this and the oft used BEDtools appears to involve writing code, which is above my computer abilities.
Any tips or solutions on how to obtain histograms from sequencing data would be very welcome. Thanks Trent Fowler
Please note this advice: http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format You definitely do *not* want to convert BAMs to BEDs. That would be a step backwards. BAM files should display in the genome browser without difficulty, unless they are attempting to display pile-ups of thousands of reads in the same location. In this case you want to use the samtools functions to construct bigWig pile-up density graphs from your BAM files. BAM files can display from a URL to your file, you do not need to upload them to the genome browser. They are very efficient when used in this manner. --Hiram ----- Original Message ----- From: "Trent Fowler" <Trent.Fowler@tufts.edu> To: galaxy-user@bx.psu.edu Sent: Tuesday, May 29, 2012 11:57:31 AM Subject: [galaxy-user] Genome Browser Histogram Visualization of Accepted Hits Hello, I am attempting to run accepted hit data from Tophat output into the UCSC Genome Web Browser for visualization of sequencing hits in specific genes. However, the BAM files yield tiles and are too large to present through the browser. Is there a better file format to convert to that would allow better visualization such as histograms?
From word of mouth, I have been told to convert BAMs to BEDs and put BED files through the browser. However, I notice that Galaxy does not have an option for this and the oft used BEDtools appears to involve writing code, which is above my computer abilities.
Any tips or solutions on how to obtain histograms from sequencing data would be very welcome. Thanks Trent Fowler
Hi Trent, Hiram's advice is very good for UCSC display (thanks Hiram!) You most likely heard of converting BAM->SAM->Interval->BED->BigBED. But, Hiram is exactly correct - there will be a great loss of information (notably the full individual sequence data itself, as only the coordinates + variation will remain, plus the data will be in BED6, which means no splicing, just global coordinates - you won't want this for RNA-seq). Since you are examining RNA-seq data, I also wanted to remind you of the choice to use Trackster for visualization (top Galaxy menu -> "Visualization"). A local or custom reference genome can be used, histogram or full read display is possible, and deeply sequenced areas can be "unpacked" in stages by using the visualization controls (really nice). Other features such as drag and drop composite track creation, new datasets from selects within Trackster (to feed more analysis), strand coloring, etc. are fairly recent enhancements (see: http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_03_12#Galaxy_Track_Browser_.28G... http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_05_11#Galaxy_Track_Browser_.28G...). More undocumented (so far!) features are added all the time, so good to experiment. A new paper is in the works that will cover the details. And there will be tutorials, screencasts, etc. later on, but for now during the active development cycle, just using it is best and is pretty intuitive. We're interested in what you like or wish was added (this is true to everyone reading this!). No promises, but is good to get feedback! If there is a particular UCSC track (or another source mapped to same reference genome) that you wish to compare your data against, it could be imported into Galaxy and added to your Trackster visualization session. There are no known limits on data size/depth - only the general Galaxy dataset file size limit of 50G (applies to all datasets) - although I suppose an extreme corner case could probably be devised/uncovered. The point is that a routine RNA-seq experiment shouldn't present a problem. Special tuning to speed up and handle depth has been a big area of recent development activity. If you do run into issues with Trackster, feel free to ask questions (I'll probably ask you to send a share link to the visualization on the pubic main Galaxy instance and bring Jeremy into the discussion if the solution is not clear). Please let us know if you need more help, Jen On 5/29/12 11:57 AM, Fowler, Trent wrote:
Hello,
I am attempting to run accepted hit data from Tophat output into the UCSC Genome Web Browser for visualization of sequencing hits in specific genes. However, the BAM files yield tiles and are too large to present through the browser. Is there a better file format to convert to that would allow better visualization such as histograms?
From word of mouth, I have been told to convert BAMs to BEDs and put BED files through the browser. However, I notice that Galaxy does not have an option for this and the oft used BEDtools appears to involve writing code, which is above my computer abilities.
Any tips or solutions on how to obtain histograms from sequencing data would be very welcome.
Thanks
Trent Fowler
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
Thanks Jen, I have not had good luck with the Visualization feature. Using BAM files, it has worked once to produce a single histogram but mostly seems stuck preparing data for up to 3 hours with no production. Additionally, once a graph is made how can one save the graph as a picture file; pdf, etc? Best, Trent From: Jennifer Jackson [mailto:jen@bx.psu.edu] Sent: Wednesday, May 30, 2012 3:17 PM To: Fowler, Trent Cc: galaxy-user@bx.psu.edu Subject: Re: [galaxy-user] Genome Browser Histogram Visualization of Accepted Hits Hi Trent, Hiram's advice is very good for UCSC display (thanks Hiram!) You most likely heard of converting BAM->SAM->Interval->BED->BigBED. But, Hiram is exactly correct - there will be a great loss of information (notably the full individual sequence data itself, as only the coordinates + variation will remain, plus the data will be in BED6, which means no splicing, just global coordinates - you won't want this for RNA-seq). Since you are examining RNA-seq data, I also wanted to remind you of the choice to use Trackster for visualization (top Galaxy menu -> "Visualization"). A local or custom reference genome can be used, histogram or full read display is possible, and deeply sequenced areas can be "unpacked" in stages by using the visualization controls (really nice). Other features such as drag and drop composite track creation, new datasets from selects within Trackster (to feed more analysis), strand coloring, etc. are fairly recent enhancements (see: http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_03_12#Galaxy_Track_Browser_.28G... http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_05_11#Galaxy_Track_Browser_.28G...). More undocumented (so far!) features are added all the time, so good to experiment. A new paper is in the works that will cover the details. And there will be tutorials, screencasts, etc. later on, but for now during the active development cycle, just using it is best and is pretty intuitive. We're interested in what you like or wish was added (this is true to everyone reading this!). No promises, but is good to get feedback! If there is a particular UCSC track (or another source mapped to same reference genome) that you wish to compare your data against, it could be imported into Galaxy and added to your Trackster visualization session. There are no known limits on data size/depth - only the general Galaxy dataset file size limit of 50G (applies to all datasets) - although I suppose an extreme corner case could probably be devised/uncovered. The point is that a routine RNA-seq experiment shouldn't present a problem. Special tuning to speed up and handle depth has been a big area of recent development activity. If you do run into issues with Trackster, feel free to ask questions (I'll probably ask you to send a share link to the visualization on the pubic main Galaxy instance and bring Jeremy into the discussion if the solution is not clear). Please let us know if you need more help, Jen On 5/29/12 11:57 AM, Fowler, Trent wrote: Hello, I am attempting to run accepted hit data from Tophat output into the UCSC Genome Web Browser for visualization of sequencing hits in specific genes. However, the BAM files yield tiles and are too large to present through the browser. Is there a better file format to convert to that would allow better visualization such as histograms?
From word of mouth, I have been told to convert BAMs to BEDs and put BED files through the browser. However, I notice that Galaxy does not have an option for this and the oft used BEDtools appears to involve writing code, which is above my computer abilities.
Any tips or solutions on how to obtain histograms from sequencing data would be very welcome. Thanks Trent Fowler ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://galaxyproject.org
participants (3)
-
Fowler, Trent
-
Hiram Clawson
-
Jennifer Jackson