David, in my experience with Illumina sequencing, it looks like the reads at the start of a file have a much higher sequencing error rate.

On Nov 9, 2011, at 4:52 AM, David Matthews wrote:

Hi,

This may be a bit dumb or missing the point but just selecting the first 5 million is kind of random isn't it? I mean where the reads map and what they are from is not known to you and they were not collected by the sequencer in a manner that is influenced by the nature of the sample?

Best Wishes,
David.