Re: [galaxy-dev] NGS impressions (long)

7 Oct 2009

      Hello Anton,

Thank you for your answer.

Anton Nekrutenko wrote, On 10/07/2009 11:11 AM:
...
All user input will be required to go through fastq groomer:
http://bitbucket.org/galaxy/galaxy-central/src/tip/tools/next_gen_conversion... 
Looks very useful. I'm looking forward to 'pull' it...
...
the issues with bowtie and bwa indicies will be fixed, speaking of 
complexities of bowtie/bwa program = you kinda need to know what you're 
doing and Galaxy will never be able to help with that,
I most definitely agree - you always need to know what you're doing.
But, the BWA and Bowtie tools (as they appear now in Galaxy) provide a way to use them without realizing you don't know what you're doing.

I'll just give one example which is critical in our lab: "unique mappers".
These are short-read that map only once to the reference genome (within an acceptable alignment-score/mismatches range).
If you look for "uniquely mapped" in google scholar you'll see many papers that make use of those.

Here's an example of a unique-mapper: AACACCTTTGGGTGGTATGACTGGTTTCCACATGCAAACTGAAGATCGAA
It maps once (to one location) in the human genome (hg18) perfectly, without mismatches.
It maps to other locations, but with many more mismatches.

There's no ambiguity about the above sequence, and I'm sure BWA and Bowtie would return the same result.

My issue is with non-unique mappers.
BWA and Bowtie (with the "common" parameters), return what they consider the "best" match - but only one result.
This gives the false impression that the sequence mapped once (=uniquely) - unless you know a-priori that the default parameters
will choose just one location and ignore the others.

The naive way to finding unique mappers is to count to how many times they appear in a mapping result file - this will not work with BWA & Bowtie and their default parameters.

This is getting philosophical, and I'll try to give it a rest...
...
but blat is 
definitely NOT the way to map reads.
Not to map millions of reads at once - true (and also not for paired-end reads),
But to map one (or a handful) of reads to quickly see where they map - I think it does a very good job.

I used it many times as a control - after mapping and intersecting and annotating and filtering,
you end up with not-too-many sequences that you think are important - load them up to the UCSC genome browser and see how they look and where they map to.
If blat's results are consistent with your other program's result - you're OK.
We also use an exhaustive search program on the whole genome sometimes, but that takes relatively long time.

regards,
  -gordon.

Re: [galaxy-dev] NGS impressions (long)

Assaf Gordon