You can do all the quality filtering with Galaxy, but may involve various manipulations of the data. If I am not mistaken the "metagenomics" workflow may help you out a little. Its designed for 454 data but should give you an idea of how to go about things. There is a video tutorial on the site for this workflow. A good place for you to start, however, may be here: Subject: http://edwards.sdsu.edu/prinseq_beta/# Prinseq is easy to use and will give you a full break down of your raw data and enables you to filter by quality/length etc. FASTqc is an Illumina specialized preliminary analysis tool: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ I my self was not very impressed with CLC at all. It lacks very rudimentary yet critical functions. I am currently working on population amplicon data so couldn't really help you too much in the latest mapping to reference advances, but I found the Lasergene SEQman mapping and de-nova assembler much better than the CLC assembler. Good luck Jack On 11
Message: 6 Date: Tue, 5 Apr 2011 08:44:06 +0200 From: Lali <laurafe@gmail.com> To: galaxy-user@lists.bx.psu.edu Subject: [galaxy-user] Analyzing Targeted Resequencing data with Galaxy Message-ID: <BANLkTin1ShWLQQ46+mFFBcxS-dO1GJuw_A@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1"
Hi! I am having problems with my sequencing results, but I am a newbie at this; so I am thinking there is something wrong with my analysis. So far, I've tried Galaxy and CLC Workbench, but with CLC I could not align to the whole genome, only to individual chromosomes (maybe there is a way, but by the time the trial ended I had not found it).
I used SureSelect capture kit and did single end sequencing on an Illumina. The files the lab sent me are FastQ Illumina 1.5 files, my samples were indexed, and I got a series of files each representing an Index.
What would be the standard workflow for this kind of data? Which tools/settings?
Does anyone have an example Galaxy workflow for preparing (clipping adapters, quality trimming) and mapping Targeted Resequencing Data?
Is there a way to obtain a coverage report through Galaxy?
Is it possible to ignore/discard the reads mapped when the coverage is below a certain threshold?
I know, I know, a lot of things, but I am very lost. Any help is appreciated.
L