Hello, To summarize, you want to find existing genes that: 1 - have overlap with your ChIP-seq dataset 2 - have overlap within 5000 bp upstream of known TSS intervals The basic steps are: a - obtain intervals for TSS b - obtain intervals for ChIP-seq peaks c - obtain intervals for existing genes (transcripts) d - answer both #1 & #2 above by comparing a + b, then the result + c using tools from the group "Operate on Genomic Intervals" plus other data manipulation tools as needed For a, this was the prior question/reply. For b, please see: http://main.g2.bx.psu.edu/u/james/p/exercise-chip-seq http://main.g2.bx.psu.edu/u/galaxyproject/p/using-galaxy-2012 -> Prot 3 For c, this is in the UCSC mailing list post, but also in several Protocols of the Using Galaxy paper. For d, see Prot 1 in the Using Galaxy paper for how to identify common regions to address question #1. Prot 4 walks through all Genomic Interval tools, plus the tools themselves have example graphics. Hopefully this helps, Jen Galaxy team On 7/23/12 7:33 AM, shamsher jagat wrote:
I want to have list of genes from UCSC browser or known genes.
Thanks
Kanwar
On Fri, Jul 20, 2012 at 8:00 PM, Jennifer Jackson <jen@bx.psu.edu <mailto:jen@bx.psu.edu>> wrote:
Hello Kanwar,
On 7/20/12 3:31 PM, shamsher jagat wrote:
I am interested in getting regions flanking TSS, I am using Glaxaxy and have downloaded TSS sites using this post steps https://lists.soe.ucsc.edu/__pipermail/genome/2011-June/__026175.html <https://lists.soe.ucsc.edu/pipermail/genome/2011-June/026175.html>
Now what I would like to do is to get 5000 bp upstream an downstream using flank tool in galaxy, but i realize it only gave me option for gene start or whole gene.
The "Region:" options are: 1 - around start - meaning interval start coordinate 2 - around end - meaning interval end coordinate 3 - whole gene - meaning entire intervals
Pick option #1.
Is it possible to extract 5000 bp upstream and downstream regions across tss start site .
The "Location of the flanking region/s:" options are: 4 - Upstream 5 - Downstream 6 - Both
Pick option #6 with "Length of the flanking region(s):" set to 5000.
Once I have that then I want to find non overlaping
genes in my regions from chipseq data.
Do you want to identify/label known genes or discover novel genes? This part of your question is not clear. Could you explain in more detail the end goal? It is likely some for of the tool "Operate on Genomic Intervals - > Merge will do what you want", but it is difficult to recommend the correct option.
Going forward, sending question to a single public list, as Brooke also suggests, is best. It is generally considered a good idea to not post to two or more, at the same time, with the same email to start threads.
Thanks! jen Galaxy team
Thanks
Kanwar
_____________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org <http://usegalaxy.org>. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/__listinfo/galaxy-dev <http://lists.bx.psu.edu/listinfo/galaxy-dev>
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
-- Jennifer Jackson http://galaxyproject.org