MACS out put files from Galaxy
I ran MACS on my chipseq dataset and found various files: 1. under html report there ar etwo files one of negative peaks.xls and second is peaks.xls the file peaks.xls is same as peaks .intreval file in the right out put flow with one bp position added e..g if peak coordinate under html report are 99 to 120 than in the peaks .interval it is 100 to 121. Which one should be followed? 2. What is the meaning of negative peak. interval file? 3. I have used ctrl and treated sample to run MACS - there are two wig files one ctrl.wig and another treatment. Wig; Do these two files belong to ctrl and treated samples then where are corresponding bed files. If someone can direct me to the out put as we get in Galaxy while using MACS that will be helpful Thanks
Hi Peter, I will try to give you my basic understanding of how MACS works. But before, I would recommend you to read the MACS Readme <http://liulab.dfci.harvard.edu/MACS/README.html> , where you will find detailed explanations of how MACS works and what the different output files are. 1. I will say that it depends on what you want to do with the MACS analysis. Usually, for all the downstream analyses that you can do, you will need the interval file (bed). It's just a tab-separated tabular file that can be opened with MS Excel anyway. I personally use Excel for quickly look at the different FDR. Regarding the slight shift between the positions in BED or XLS, this is normal and due to XLS formatting (I believe so)... Here's the Readme's explanation: Coordinates in XLS is 1-based which is different with BED format. 2. See the MACS paper for a precise answer, but basically, the negative peaks are used by MACS to calculate the FDR. 3. The wig files contains both the position of your interval (your reads) and a "score". For this reason I like to use it for visualization of my data, although it can be also used as input for downstream tools like CEAS (gene annotation). Contrary to the wig file, the bed file does not include a "score", but only the precise location of your peaks (not reads). As these peaks are detected by comparing the reads in your treatment versus control, there is only one file (a peak corresponding to a significant enrichment of reads that is not present in the control sample in one locus). I hope I was able to help you... I'm sure a lot of people on this list can give you more details if you need to (and more accurate). Best, From: galaxy-user-bounces@lists.bx.psu.edu [mailto:galaxy-user-bounces@lists.bx.psu.edu] On Behalf Of peter scot Sent: Tuesday, September 11, 2012 12:45 PM To: galaxy-user@lists.bx.psu.edu Subject: [galaxy-user] MACS out put files from Galaxy I ran MACS on my chipseq dataset and found various files: 1. under html report there ar etwo files one of negative peaks.xls and second is peaks.xls the file peaks.xls is same as peaks .intreval file in the right out put flow with one bp position added e..g if peak coordinate under html report are 99 to 120 than in the peaks .interval it is 100 to 121. Which one should be followed? 2. What is the meaning of negative peak. interval file? 3. I have used ctrl and treated sample to run MACS - there are two wig files one ctrl.wig and another treatment. Wig; Do these two files belong to ctrl and treated samples then where are corresponding bed files. If someone can direct me to the out put as we get in Galaxy while using MACS that will be helpful Thanks
Hi Peter, Florian's answers are very good - I am not sure this will add much, but perhaps a little, for the Galaxy output datasets parts of the questions ... The latest Using Galaxy paper, protocol 3, includes all of the "optional" output that MACS in Galaxy will produce (in addition to the linked files from the HTML report). Apart from the primary BED file and HTML output, there are 4 files paired by tags/control = 2 interval and 2 wig. The coordinate system used by each file specification can vary, as you observed and already noted. See the documentation links for exactly how these files are formatted. But regardless of the file coordinate system, a proper browser that interprets the datatype correctly will display the start/stop correctly, which is where the output datasets in Galaxy can be useful. Meaning, that whether the start in the file is 1-based or 0-based, the actual start base will visualize as the same start base. Load the output into the UCSC Browser or Trackster in Galaxy and scroll into one of the regions to view this, and compare with the files, both datasets in Galaxy and downloaded through links) to better understand. Full documentation for core MACS output is in the MACS documentation (link given by Florian, also linked from MACS tool page). Documentation/examples for the Galaxy output files is in our paper: http://main.g2.bx.psu.edu/u/galaxyproject/p/using-galaxy-2012 (scroll to protocol 3) http://onlinelibrary.wiley.com/doi/10.1002/0471250953.bi1005s38/full#bi1005-... (see step #6) More help for datatypes: http://wiki.g2.bx.psu.edu/Learn/Datatypes (bed, interval, wig are all covered with links to more resources) Florian mostly covered these, but I'll also address to be clear: On 9/11/12 9:45 AM, peter scot wrote:
I ran MACS on my chipseq dataset and found various files:
1. under html report there ar etwo files one of negative peaks.xls and second is peaks.xls the file peaks.xls is same as peaks .intreval file in the right out put flow with one bp position added e..g if peak coordinate under html report are 99 to 120 than in the peaks .interval it is 100 to 121. Which one should be followed? Related to different coordinate system. See file specifications.
2. What is the meaning of negative peak. interval file?
Is a type of control data - basically the inputs are flipped to produce it. May not be needed/useful for further downstream analysis. The advice to read the MACS doc to fully understand is a good one.
3. I have used ctrl and treated sample to run MACS - there are two wig files one ctrl.wig and another treatment. Wig; Do these two files belong to ctrl and treated samples then where are corresponding bed files.
These show the data density (pileup) in a graphical format. No bed files, although you can visualize these against the other bed and/or interval peak data to see how density was interpreted when calling peaks. Hopefully this helps! Jen Galaxy team
If someone can direct me to the out put as we get in Galaxy while using MACS that will be helpful
Thanks
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
participants (3)
-
Duclot, Florian
-
Jennifer Jackson
-
peter scot