Gerome: This question is more applicable to our galaxy-user list, so I am redirecting it there. See the reply below. What you are trying to do is, in fact, quite easy to perform with galaxy: 1. Extend gene coordinates either direction with "Operate of Genomic Intervals->Get Flanks" 2. Join the newly extended genes with SNP dataset using "Operate of Genomic Intervals->Join" 3. Group by gene name and concatenate on SNP name using "Join, Subtract and Group -> Join". Please, see this history: http://main.g2.bx.psu.edu/history/imp?id=354aae8cd0d5e1c5 for an example. If you click on "rerun" link within history item (see attached image), you will be able to see parameters I've used. NOTE: There are two kind of join in Galaxy: relational join ("Join, Subtract and Group->Join") and interval join ("Operate of Genomic Intervals->Join"). Here we use interval join. If have any trouble I will make a galactic quickie (short movie) for you. Thanks, anton galaxy team On Sep 16, 2009, at 8:57 AM, Gerome Breen wrote:
Hi, I am puzzled about how to generate something - can you help? I want to generate a file with hapmap or other sets of snps mapped to UCSC genes. I want to have one line per gene with the min start -20kb and max end +20kb and then a tab sperated list of hapmap SNPs in each gene. I have tried to do this but failed - the one line per gene is complicate by the fact that aggregate on column seems to discard the chr information and a join within two coordinates doesn't seem to be available but anyway would be very computationally intensive. Any solutions would be appreciated.
The file format I am aiming at:
chr start end gene_name snp1 snp2 snp3 etc.
Best
Gerome. _______________________________________________ galaxy-bugs mailing list galaxy-bugs@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-bugs
Anton Nekrutenko http://nekrut.bx.psu.edu http://galaxyproject.org