Hi Jennifer,

Thank you for this great post!  

I know it’s likely to be a tremendous redundancy as a request, but is there somehow to move your response up in the search list? 

Always grateful for your extreme patience with us all,

David
FHCRC

On Dec 2, 2013, at 9:22 PM, Jennifer Jackson <jen@bx.psu.edu> wrote:

Hello Catheryn,

Yes, all of this can be done. Once you have an annotation source identified (or sources!), the rest is part of the core functionality of Galaxy.

One of the outputs from MACS is a bed file with the peaks. BED format is similar to interval format and can be used with the tools in the group "Operate on Genomic Intervals". Or if as BED, with tools in the group "BEDTools" (such as 'Intersect multiple sorted BED files'). If you need help understanding these datatypes, this wiki explains - see the last bullet for links:
http://wiki.galaxyproject.org/Support#Dataset_special_cases

The idea is to obtain annotation data also in BED/interval format, then perform the comparisons. Where there is overlap (or no overlap, in the case of intergenic), the annotation can be assigned. I am not sure what genome you are working with, but if it is available from UCSC or another common public site, this can be fairly straightforward (but this is very important - the same, exact base reference genome that you mapped against must be the one you extract annotation from - the name in Galaxy will be the same exact name as the source in nearly all cases - please ask if you have a question about this).

At UCSC, the Table browser contains all the annotation tracks found in the Browser itself, and you will most likely want to use those from the "Gene and Gene Prediction" group, although there are likely others in the ENCODE group that are also of interest. The description for each track is at UCSC, including methods, often very detailed. When extracting the data (using the tool "Get Data -> UCSC Main table browser"), options to subset the BED output regions by exons or introns or predicted promoter regions, etc. are available. 

Biomart can be another great source of annotation, especially for genomes in Ensembl annotation builds. The tool would be "Get Data -> BioMart Central server". The same basic extraction concepts would apply although the form is organized differently. The help there will guide you. The important parts are the chromosome, start, and end. The best tip I can offer when working with Biomart data is to avoid HTML content - this is often found in the longer descriptions. If you get an import error about HTML content, this isn't a huge problem. Just try again, eliminating suspected fields - the field/s with the HTML can usually be identified quickly with a few test imports.

There are other sources in this "Get Data" tool group and many other external annotation projects that have data (from these you can simply download/upload or directly load via a URL). You can start with a larger file with all of the details, compare with just coordinates, then go back and pick up the details with a final join. Some examples of how to do these types of operations are in our ChIP-seq example and in our paper from last year, links here:
https://usegalaxy.org/u/james/p/exercise-chip-seq
https://usegalaxy.org/u/galaxyproject/p/using-galaxy-2012
 
Please note that the public Main server at usegalaxy.org will be unavailable during US East coast business hours tomorrow as stated on the current banner:
"
 TACC will be performing storage system updates on Tuesday, December 3 from 9 AM to 6 PM EST (UTC -0500). During this time, Galaxy will be unavailable." 

Hopefully this helps!

Jen
Galaxy team

On 12/2/13 6:35 PM, Wooi Lim wrote:
Dear Galaxy,
 
I am analysing ChIP-Seq data from Illumina using Galaxy web server. I mapped the reads with bowtie and did the peak calling with Macs.
The next thing I wanted to do is to annotate the peaks with genomic regions i.e. promoter, intergenic, intron etc and gene names.
 
I am not sure if these can be achieved through Galaxy and if so, how can this be done? Thank you.
 
Catheryn


___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
http://galaxyproject.org
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

 http://galaxyproject.org/search/mailinglists/