Re: [galaxy-user] How to extract geneID from pileup file?

29 Aug 2013

      Hello,

*First option* is the tool " SnpEff Variant effect and annotation". This 
would require setting up a cloud instance and adding the appropriate 
annotation to the tool for use with the genome you are working with. See 
the tool shed for more about SnpEff, or the Main/Test server if you want 
to try it out - is not set up for very many genomes and quotas on Test 
are small, as it is not intended for intensive use.
http://wiki.galaxyproject.org/Cloud

*The second* *option* is to do this in a more step-by-step method, 
something like:

  1 - start with a pileup file (not vcf, so use Generate pileup, or use 
Mpileup without 'Genotype Likelihood Computation:'

2 - use 'Filter Pileup' and for 'Convert coordinates to intervals?:' 
choose "yes"

3 - now that the data is in interval format, it can be compared with any 
other interval (bed, etc.) dataset that is mapped to the same genome to 
determine overlap using the tools in the 'Operate on Genomic Intervals' 
tool group.

Obtain gene (actually transcript) annotation bed files ('bed' is a 
stricter form of 'interval' format) from sources under "Get Data". Good 
choices are UCSC and Biomart for many genomes, in particular because you 
can select out reference bed files that contain specific regions of 
transcripts: UTR, Exons, Introns, user-specified regions upstream or 
down, etc., but other sources may be appropriate depending on your 
genome and needs. As long as the reference annotation you are using is 
mapped to the same exact genome, then this will work. Once you have a 
process, save it in a workflow for future use.

*Another great (NEW!) option* includes some tools that are still in beta 
status on the Test server. You can run it here on very small datasets to 
see if you like, then decide if moving to a cloud and setting it up 
there is something you want to do. Called "Naive Variant Detector" and 
"Variant Annotator", these run on VCF files, and will produce statistics 
somewhat similar to (but with more detail and a different underlying 
algorithm than) "Filter Pileup". The result here is not in interval 
format - it is VCF, but it could converted (use tools in Text 
Manipulation to create a start/stop) or proceed to SnpEff as is.

You had another earlier question about this same analysis - I will 
include some other advice in that reply, next,

Best,

Jen
Galaxy team

On 8/28/13 10:49 PM, Yan He wrote:
...
Dear galaxy-users,
I am working on a project to identify and genotype SNPs in targeted 
genes. I did some analysis using Galaxy. First, mapping to the genome 
with Bowtie. Second, identify SNPs using MPileup in SAMtools. When I 
got the pileup file, the SNP information is in which chromosome and 
what position. I would like to focus on the SNPs within genes.  How 
could I extract the SNP information for each genes (SNP position, 
coverage)? //Is there a tool in Galaxy to fulfill this? Any help is 
highly appreciated!
Best wishes,
Yan
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
-- 
Jennifer Hillman-Jackson
http://galaxyproject.org

Re: [galaxy-user] How to extract geneID from pileup file?

Jennifer Jackson