Aaron,

Please do check for contaminants.... our experience with service providers and QC....I can write a book probably :(. The FastQC suite is a good place to start (also a galaxy wrapper is available for that). Even for 454 (not having fixed base positions and fixed lengths) it’s quite informative (kmer overrepresentation and such).

In addition...check for contaminating sequences (ie Coli or Mycoplasma sequences not expected when sequencing human cells.... but you better check ....experience).

In the MIRA documentation you will find some info on this prior to assembly filtering as well if I remember correctly.

Please keep us posted on your progress.

@Peter; hope you manage to take a flight and join the conference. A pity I won’t be there but it looks very promising...

Alex

Van: Aaron Jex [mailto:ajex@unimelb.edu.au]
Verzonden: dinsdag 24 mei 2011 8:52
Aan: Bossers, Alex
Onderwerp: RE: [galaxy-user] (no subject)

Hi Alex,

Thanks for the email. I will have to have a closer read of the MIRA documentation I think. I know that it definitely makes use of the quality data to some extent, but I hadn’t considered whether it ignores low quality data or not (perhaps there’s a threshold setting I could use – I’ll check that). I’m not too worried about adaptor sequence at the moment as these “should” be trimmed by our sequencing service, and I clip the ends on the reads when I extract the qual and fasta files from the original sff files anyways.

Best regards,
Aaron

Aaron Jex, BSc, PhD

Senior Research Officer,

Department of Veterinary Science,

The University of Melbourne,

250 Princes Highway,

Werribee, Victoria,

3030

tel: +61 3 9731 2294

From: Bossers, Alex [mailto:Alex.Bossers@wur.nl]
Sent: Tuesday, 24 May 2011 4:44 PM
To: Aaron Jex; galaxy-user@bx.psu.edu
Subject: RE: [galaxy-user] (no subject)

Aaron,

As far as I remember MIRA....isn’t MIRA taking into account the low/high quality bases anyway? So no need to filter there right?

Only filtering needed is for contaminating sequences.....(incl adapters and such). You can/have to check the MIRA website to be sure though.

The high qual segments I have used as in the metagenomics example but indeed you loose the exact qual info....but that is already above the provided threshold (default above 20 in Sanger quality score range).

Alex

Van: galaxy-user-bounces@lists.bx.psu.edu [mailto:galaxy-user-bounces@lists.bx.psu.edu] Namens Aaron Jex
Verzonden: dinsdag 24 mei 2011 1:40
Aan: galaxy-user@bx.psu.edu
Onderwerp: [galaxy-user] (no subject)

Hi,

Can’t seem to find an answer to this on your wiki site and it’s not in the tutorial. I would like to filter my 454 reads for high quality regions, rename the resulting sequence fragments AND relink the new reads (fragments) to the original quality data so that I can take these filtered reads and assembly them using MIRA. Is there a way to do this with Galaxy? So basically all I want to do is take the new read fragments I get from converting the tabular file to the fasta file as shown in your metagenomics tutorial, and generate a corresponding qual file for these ‘new’ reads.

Best regards,

Aaron

Aaron Jex, BSc, PhD

Senior Research Officer,

Department of Veterinary Science,

The University of Melbourne,

250 Princes Highway,

Werribee, Victoria,

3030

tel: +61 3 9731 2294