Guru,

Thanks, for the handy tip on getting rid of duplicates. The join file now contains 130 items with no duplicates. I guess there is a mismatch between what SNP130 considers a missense mutation and what UCSC Genes considers to be coding sequence.

Paul

Guruprasad Ananda wrote:
Hi Paul,

  
* The SNP file contains 149 regions but when joined to the Codons there are 311 items in the output. I was expecting one joined record per SNP.
* The joined file contains many duplicate SNPs and missing SNPs
    

Your gene list might contain several overlapping genes/reading frames and therefore when you fetch codons you'll have the same positions present multiple times. As a result, a given SNP might join with multiple codons from overlapping genes/reading frames. If you want to avoid this, you can remove duplicate codons using "Statistics > Count" tool (with chr, start and end columns selected). Please note that this tool will return a tabular output. You'll need to click on the pencil icon next to the output dataset and change datatype to 'interval' and set chr, start, end columns to 2,3,4 respectively.

Hope this answers some of your questions.
Thanks for using Galaxy,
Guru.



On Jun 1, 2010, at 4:34 AM, Paul Webster wrote:

  
Hi,

I'm trying to investigate conservation in SNPs using Galaxy, but running into a few "issues" so I'm probably not doing this the best way.

Here is what I did in Galaxy:
(1) Get some high heterozygosity missense SNPs from UCSC for chr21
(2) Get all Genes from UCSC for chr21
(3) Split the genes into codons using the "Gene BED to Codon BED expander"
(4) Join the SNPs(1) to the Codons(2) using {Operate on genomic intervals}->Join
(5) Create a multiple alignment for the codons which had SNPs using {Fetch Alignments}->{Extract MAF blocks}

Some problems I found were:
* The SNP file contains 149 regions but when joined to the Codons there are 311 items in the output. I was expecting one joined record per SNP.
* The joined file contains many duplicate SNPs and missing SNPs
* MAF blocks are all in same orientation but about  half the codons should be in the reverse direction

Can anyone offer advice?

Thanks,
Paul


******************************************************************
sample output
******************************************************************
(1) SNPs (149 records)
chr21    15436474    15436475    rs3859679    missense    TAT,TTT,    Y,F,
chr21    15481364    15481365    rs7278737    missense    GAC,GAA,    D,E,
chr21    15516947    15516948    rs2822432    missense    GAA,AAA,    E,K,

(2) Genes (901 records)
chr21    9690070    9690100    uc002zkg.1    0    +    9690070    9690070    0    1    30,    0,
chr21    9711934    9769223    uc011abu.1    0    +    9711934    9711934    0    10    104,31,70,82,29,73,71,164,195,379,    0,34186,36895,40899,43769,43889,49915,54029,55562,56910,
chr21    9907192    9908487    uc010gqn.1    0    -    9907192    9907192    0    2    982,210,    0,1085,

(3) Codons (327,371 records)
chr21    9908330    9908333    uc002zka.1    0    -
chr21    9908333    9908336    uc002zka.1    0    -
chr21    9908336    9908339    uc002zka.1    0    -

(4) Join (311 records)
chr21    15481364    15481365    rs7278737    missense    GAC,GAA,    D,E,    chr21    15481364    15481367    uc002yjm.2    0    -    GAC
chr21    15516947    15516948    rs2822432    missense    GAA,AAA,    E,K,    chr21    15516945    15516948    uc002yjm.2    0    -    GAA
chr21    15596771    15596772    rs409782    missense    TTG,GTG,    L,V,    chr21    15596771    15596774    uc002yjn.3    0    +    TTG


_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user