Analyzing Targeted Resequencing data with Galaxy
Hi! I am having problems with my sequencing results, but I am a newbie at this; so I am thinking there is something wrong with my analysis. So far, I've tried Galaxy and CLC Workbench, but with CLC I could not align to the whole genome, only to individual chromosomes (maybe there is a way, but by the time the trial ended I had not found it). I used SureSelect capture kit and did single end sequencing on an Illumina. The files the lab sent me are FastQ Illumina 1.5 files, my samples were indexed, and I got a series of files each representing an Index. What would be the standard workflow for this kind of data? Which tools/settings? Does anyone have an example Galaxy workflow for preparing (clipping adapters, quality trimming) and mapping Targeted Resequencing Data? Is there a way to obtain a coverage report through Galaxy? Is it possible to ignore/discard the reads mapped when the coverage is below a certain threshold? I know, I know, a lot of things, but I am very lost. Any help is appreciated. L
Dear all, I’m analysing SNPs data for the first time. I tried with the few software I found in litterature but they can only manage small datasets. I am currently trying with “genetics” package in R but the Geno function takes into account a marker at a time. Considering I have to analyse 200 samples with 50K markers is there any way to tell R to analyse each SNP one after the other? Thank you very much for the help. Laura
Laura, What kind of data you have and you would like to achieve? There are some Galaxy wrappers for plink (http://pngu.mgh.harvard.edu/~purcell/plink/) that may be useful for some kinds of analysis available in the rgenetics tools if you have linkage pedigree genotype and map files. On Tue, Apr 5, 2011 at 5:19 AM, Laura Iacolina <liacolina@uniss.it> wrote:
Dear all, I’m analysing SNPs data for the first time. I tried with the few software I found in litterature but they can only manage small datasets. I am currently trying with “genetics” package in R but the Geno function takes into account a marker at a time. Considering I have to analyse 200 samples with 50K markers is there any way to tell R to analyse each SNP one after the other?
Thank you very much for the help.
Laura
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Ross Lazarus MBBS MPH Associate Professor, HMS; Director of Bioinformatics, Channing Laboratory; 181 Longwood Ave., Boston MA 02115, USA. Tel: +1 617 505 4850 Head, Medical Bioinformatics, BakerIDI; PO Box 6492, St Kilda Rd Central; Melbourne, VIC 8008, Australia; Tel: +61 385321444
On Tue, Apr 5, 2011 at 5:19 AM, Laura Iacolina <liacolina@uniss.it> wrote:
Considering I have to analyse 200 samples with 50K markers is there any way to tell R to analyse each SNP one after the other?
From: Ross [mailto:ross.lazarus@gmail.com]
There are some Galaxy wrappers for plink (http://pngu.mgh.harvard.edu/~purcell/plink/) that may be useful for some kinds of analysis available in the rgenetics tools if you have linkage pedigree genotype and map files.
I would also advise using plink for this. Calculating SNP marker statistics [1] is the one of the things that it has been designed to do. The main problem is getting data into a format supported by plink, either linkage (one line per individual), or transposed pedigree (one line per marker). There are details on these formats in the plink documentation [2]. [1] http://pngu.mgh.harvard.edu/~purcell/plink/summary.shtml#freq [2] http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#tr -- David Eccles (gringer)
Laura: SNP identification and analysis is a very complex subject and without knowing what you are trying to do it is very difficult to point you to the right direction. Perhaps a good place to start would be a supplement for the last year's report from 1000 Genomes Consortium (Nature. 467(7319): p. 1061-1073). Some of the steps you can perform through Galaxy, yet some are in development. Thanks! anton galaxy team On Apr 5, 2011, at 5:19 AM, Laura Iacolina wrote:
Dear all, I’m analysing SNPs data for the first time. I tried with the few software I found in litterature but they can only manage small datasets. I am currently trying with “genetics” package in R but the Geno function takes into account a marker at a time. Considering I have to analyse 200 samples with 50K markers is there any way to tell R to analyse each SNP one after the other?
Thank you very much for the help.
Laura
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
Lali: In your case the workflow for capture re-sequencing should look like this: 1. QC data (groom fastq files and plot quality distribution) 2. Map the reads (use bwa) 3. Generate and filter pileup 4. Intersect pileup with coordinates of sure select bates. However, before you dive in please understand basic Galaxy functionality by taking a look at http://usegalaxy.org/galaxy101 and watching *all* Illumina-related Galaxy quickies (black boxes on the front page on Galaxy). Next, take a look at http://usegalaxy.org/heteroplasmy. Note, that we are working on bringing "industrial-strength" diploid genotyping functionality in Galaxy in the next two-three months that will include more sophisticated genotypers, recalibration and realignment tools, and novel visualization approaches. Thank for using Galaxy. anton galaxy team On Apr 5, 2011, at 2:44 AM, Lali wrote:
Hi! I am having problems with my sequencing results, but I am a newbie at this; so I am thinking there is something wrong with my analysis. So far, I've tried Galaxy and CLC Workbench, but with CLC I could not align to the whole genome, only to individual chromosomes (maybe there is a way, but by the time the trial ended I had not found it).
I used SureSelect capture kit and did single end sequencing on an Illumina. The files the lab sent me are FastQ Illumina 1.5 files, my samples were indexed, and I got a series of files each representing an Index.
What would be the standard workflow for this kind of data? Which tools/settings?
Does anyone have an example Galaxy workflow for preparing (clipping adapters, quality trimming) and mapping Targeted Resequencing Data?
Is there a way to obtain a coverage report through Galaxy?
Is it possible to ignore/discard the reads mapped when the coverage is below a certain threshold?
I know, I know, a lot of things, but I am very lost. Any help is appreciated.
L ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
participants (5)
-
Anton Nekrutenko
-
Eccles, David
-
Lali
-
Laura Iacolina
-
Ross