Help to identify variants with clinical/phenotype associations
Hi all, I have a dataset with potential pathological variants and I'd like to combine them to a dataset with known clinical association variants to identify those responsible for the phenotype. I'll thank a lot any suggestion. -- *J. Luis Santomé Collazo*
Hi Luis, There are a few options: The tool " Phenotype Association -> SIFT" will accept an input file of variant locations/alleles and retrieve annotations, including OMIM Disease associations. Alternatively, you could label your variants by rs identifiers (or perhaps you already have these), or just use genomic coordinates, to intersect with the GWAS Catalog dataset. The general path would be to obtain the most recent dbSNP and GWAS Catalog tracks from the UCSC Table Browser ("Get Data -> UCSC Main", set genome to be hg19, then under the group "Varation and Repeats", dbSNP 135 and GWAS are both listed as tables under the dbSNP track - get both, it will require two queries). You may be joining on common keys (such as rs numbers) or overlapping genomic coordinates, depending on the starting data format and how you choose to do the intersect. For tool help, the first protocol in the "Using Galaxy" paper & supplemental walks through how to extract data from the UCSC Table browser and join data by various methods. The protocol's goal is different from your goal, but the methods will be similar to what you will be doing. The second protocol has even more examples for importing and formatting datasets, if you want to manipulate datasets to customize/alter datatype. http://main.g2.bx.psu.edu/u/galaxyproject/p/using-galaxy-2012 I am assuming that you are using a human, hg19 dataset, but if you using another, SIFT will not be possible. Still, UCSC may have analogous tracks to select from, depending on the genome. Or you could try BioMart, or one of the other sources under "Get Data", if these have your data. You can also always directly import (upload or FTP) a reference dataset of known SNP Phenotypes from any source, mapped to your target genome, and use Galaxy's tools to perform the intersection and file manipulations. Hopefully this helps to get you started! Best, Jen Galaxy team On 11/13/12 1:09 AM, Luis Santomé wrote:
Hi all,
I have a dataset with potential pathological variants and I'd like to combine them to a dataset with known clinical association variants to identify those responsible for the phenotype.
I'll thank a lot any suggestion.
-- *J. Luis Santomé Collazo*
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
participants (2)
-
Jennifer Jackson
-
Luis Santomé