Genes from Regions.
Hello, I'm not sure if this is the place to ask this, but if so - here goes. If I have a list of genomic regions (from CNV gains and losses) comprised of chromosome, start and stop (ie. chr7 68000000 71000000) for a given genome build (HG 18), and I want to add the genes (ideally HUGO gene Symbols or refseqIDs)that reside within each region per line. So I want to input something like this: Sample Chromosome Region Event Length JC 507 CD19 chr10:11,997,707-12,330,274 CN Gain 332568 JC 507 CD19 chr10:47,563,503-48,085,608 CN Loss 522106 JC 507 CD19 chr10:69,510,584-69,951,738 CN Gain 441155 And get an output similar to this: Sample Chromosome Region Event Length Gene Symbols JC 507 CD19 chr10:11,997,707-12,330,274 CN Gain 332568 CDC123, DHTKD1, NUDT5, SEC61A2, UPF2 JC 507 CD19 chr10:47,563,503-48,085,608 CN Loss 522106 AGAP9, ANXA8, ANXA8L1, CTSL1P2, FAM25B, FAM25C, FAM25G, GDF10, GDF2, LOC642826, RBP3, ZNF488 JC 507 CD19 chr10:69,510,584-69,951,738 CN Gain 441155 ATOH7, DNA2, HNRNPH3, MYPN, PBLD, RUFY2, SLC25A16 Possible ? Shawn Anderson Application Scientist - Laboratory for Advanced Genome Analysis Vancouver Prostate Centre - Vancouver General Hospital 2660 Oak Street Vancouver BC V6H 3Z6 P:604-875-4111 ext. 63436 F:604-875-5654 sanderson@prostatecentre.com<mailto:sanderson@prostatecentre.com> www.LAGAPC.ca<http://www.microarray.prostatecentre.com/>
Hello Shawn, To do this in three steps: 1 - Format your existing file, set type as interval, and assign columns ("Edit attributes"). Start by changing this: chr10:11,997,707-12,330,274 To become like this, separated by tabs: chr10 11997707 12330274 Add in strand if possible: chr10 11997707 12330274 + 2 - Obtain a mapped transcript file that includes gene identifiers a) Once choice is UCSC's "Known Genes" track: From your working history, use tool "Get Data -> UCSC main" Select the genome (hg18) and the track "UCSC Genes", with output = selected fields from primary and related tools and merge in identifiers from tables such as "hg18.kgXref". The track "RefSeq Genes" is another option (RefSeq accession is "name" and gene identifier is "name2". Send query to Galaxy, set type as interval, and assign columns. b) Another choice would normally be Ensembl Genes from "Get Data -> Biomart", but only hg19 is available. 3 - Merge the files based on overlap The tool you will most likely want to use is "Operate on Genomic Intervals -> Join", although you may want to explore others. Help: http://wiki.g2.bx.psu.edu/Learn/Interval%20Operations also see screencasts at http://usegalaxy.org quickies #3 & #5 to start with Hopefully this helps to get you started! Thanks, jen On 7/21/11 4:37 PM, Shawn Anderson wrote:
Hello,
I'm not sure if this is the place to ask this, but if so - here goes. If I have a list of genomic regions (from CNV gains and losses) comprised of chromosome, start and stop (ie. chr7 68000000 71000000) for a given genome build (HG 18), and I want to add the genes (ideally HUGO gene Symbols or refseqIDs)that reside within each region per line.
So I want to input something like this:
Sample
Chromosome Region
Event
Length
JC 507 CD19
chr10:11,997,707-12,330,274
CN Gain
332568
JC 507 CD19
chr10:47,563,503-48,085,608
CN Loss
522106
JC 507 CD19
chr10:69,510,584-69,951,738
CN Gain
441155
And get an output similar to this:
Sample
Chromosome Region
Event
Length
Gene Symbols
JC 507 CD19
chr10:11,997,707-12,330,274
CN Gain
332568
CDC123, DHTKD1, NUDT5, SEC61A2, UPF2
JC 507 CD19
chr10:47,563,503-48,085,608
CN Loss
522106
AGAP9, ANXA8, ANXA8L1, CTSL1P2, FAM25B, FAM25C, FAM25G, GDF10, GDF2, LOC642826, RBP3, ZNF488
JC 507 CD19
chr10:69,510,584-69,951,738
CN Gain
441155
ATOH7, DNA2, HNRNPH3, MYPN, PBLD, RUFY2, SLC25A16
Possible ?
*Shawn Anderson*
Application Scientist -*Laboratory****for Advanced Genome Analysis*
Vancouver Prostate Centre - Vancouver General Hospital
2660 Oak Street
Vancouver BC V6H 3Z6
P:604-875-4111 ext. 63436
F:604-875-5654
_sanderson@prostatecentre.com <mailto:sanderson@prostatecentre.com>_
www.LAGAPC.ca <http://www.microarray.prostatecentre.com/>
__________ Information from ESET NOD32 Antivirus, version of virus signature database 6314 (20110721) __________
The message was checked by ESET NOD32 Antivirus.
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org/ http://galaxyproject.org/
participants (2)
-
Jennifer Jackson
-
Shawn Anderson