finding Conserved non coding region
Hi all , I have a set of 1500 sequence obtained from my Chip_Seq experiment. I am interested to Know how many of them are within Conserved non coding Region. My definition of conserved Non coding region is as follows 1. Atleast a 200bp region with 80% conservation between Human , Mice and Chicken 2. Atleast a 200bp region with 70% conservation between Human , Mice , Chicken and Fish If someone can suggest me how to use galaxy for this purpose. Thanks in advance. Bony
Hello Bony, First, you will need to map your data to a reference genome (either native to Galaxy or one that you load into your history). Use the tools under "NGS Toolbox" or map the data using your own tools and upload. To link in MAF conservation data, use tools under "Fetch Alignments". The source of the MAF data is UCSC's Conservation tracks ("Conservation" and "Chain/Net" - please see the UCSC Genome Browser for a full description of how the data are created at http://genome.ucsc.edu/). You can also use MAF data from other source, if this is available to you. One important note: to obtain the correct native MAF data in Galaxy, be sure to set the "database" attribute. The complete analysis plan would likely involve uploading the coding regions from a gene/transcript track to use as a negative filter vs MAF alignments (UCSC's "Known Gene" or "RefSeq Genes" are good choices), to restrict the data to only those regions that are non-coding based on the primary species. Then compare the regions in the resulting MAF vs the mapping coordinates of your ChipSeq data. The MAF filtering tools in "Fetch Alignments" can help with your final analysis regarding length & conservation score parameters. Our screencasts and other tutorials may also help for basic orientation: http://main.g2.bx.psu.edu/u/aun1/p/screencasts (see #6 for MAF analysis) Shared Data -> Published Pages (complete tutorial set) Hopefully this is helpful, but please let us know if you need help with the tools once you get started. Best, Jen Galaxy team On 5/13/11 9:09 AM, De Kumar, Bony wrote:
Hi all ,
I have a set of 1500 sequence obtained from my Chip_Seq experiment. I am interested to Know how many of them are within Conserved non coding Region. My definition of conserved Non coding region is as follows
1.Atleast a 200bp region with 80% conservation between Human , Mice and Chicken
2.Atleast a 200bp region with 70% conservation between Human , Mice , Chicken and Fish
If someone can suggest me how to use galaxy for this purpose.
Thanks in advance.
Bony
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org
participants (2)
-
De Kumar, Bony
-
Jennifer Jackson