Hi, I'm trying to investigate conservation in SNPs using Galaxy, but running into a few "issues" so I'm probably not doing this the best way. Here is what I did in Galaxy: (1) Get some high heterozygosity missense SNPs from UCSC for chr21 (2) Get all Genes from UCSC for chr21 (3) Split the genes into codons using the "Gene BED to Codon BED expander" (4) Join the SNPs(1) to the Codons(2) using {Operate on genomic intervals}->Join (5) Create a multiple alignment for the codons which had SNPs using {Fetch Alignments}->{Extract MAF blocks} Some problems I found were: * The SNP file contains 149 regions but when joined to the Codons there are 311 items in the output. I was expecting one joined record per SNP. * The joined file contains many duplicate SNPs and missing SNPs * MAF blocks are all in same orientation but about half the codons should be in the reverse direction Can anyone offer advice? Thanks, Paul ****************************************************************** sample output ****************************************************************** (1) SNPs (149 records) chr21 15436474 15436475 rs3859679 missense TAT,TTT, Y,F, chr21 15481364 15481365 rs7278737 missense GAC,GAA, D,E, chr21 15516947 15516948 rs2822432 missense GAA,AAA, E,K, (2) Genes (901 records) chr21 9690070 9690100 uc002zkg.1 0 + 9690070 9690070 0 1 30, 0, chr21 9711934 9769223 uc011abu.1 0 + 9711934 9711934 0 10 104,31,70,82,29,73,71,164,195,379, 0,34186,36895,40899,43769,43889,49915,54029,55562,56910, chr21 9907192 9908487 uc010gqn.1 0 - 9907192 9907192 0 2 982,210, 0,1085, (3) Codons (327,371 records) chr21 9908330 9908333 uc002zka.1 0 - chr21 9908333 9908336 uc002zka.1 0 - chr21 9908336 9908339 uc002zka.1 0 - (4) Join (311 records) chr21 15481364 15481365 rs7278737 missense GAC,GAA, D,E, chr21 15481364 15481367 uc002yjm.2 0 - GAC chr21 15516947 15516948 rs2822432 missense GAA,AAA, E,K, chr21 15516945 15516948 uc002yjm.2 0 - GAA chr21 15596771 15596772 rs409782 missense TTG,GTG, L,V, chr21 15596771 15596774 uc002yjn.3 0 + TTG