On Wed, Oct 19, 2011 at 2:31 PM, Daniel Blankenberg <dan(a)bx.psu.edu> wrote:
Sorry for the delay. I did try the patch out shortly after you contributed
it, but it caused the functional to fail. I was able to fix the issue and
allow the existing tests to start passing, but I've been bogged down lately
and haven't been able to perform a more thorough review of the code. If you
could provide tests with files (e.g. for the tools affected) that test the
new functionality, that would be a great help.
The use of partition removes python compatibility for <2.5, although this is
I guess you could use split, but special case on there being no space.
Also, I'm not entirely sold on having the "Identifier
line" being parsed as
"identifier" + <space> + "description" instead a single
That is the normal convention, just like with FASTA.
This would mean that identifiers could not themselves contain
but "There is no standardization for identifiers" (so they could technically
have spaces?). Could two different reads be identified as "Read A" and
B", but then would no longer be uniquely identifiable as each would then be
identified as "Read". If this added functionalilty were introduced as
optional behavior (e.g. a user needs to click a checkbox on the tools to
apply the id line splitting), these concerns can be mitigated.
That is expected, "@Read A" and "@Read B" have the same identifier,
Peter, Florent, anyone else: I'd be very interested to hear your
the above, particularly in respect to know real-world data. For now, lets
discount SRA data from this discussion.
See also the new Illumina 1.8 naming convention where they dropped
the /1 and /2 and hit it in the description. It should be tested, but I think
Florent's patch will work here (while the current Galaxy behaviour won't).