On Wednesday, October 31, 2012, Edward Hills wrote:
Thanks Peter.

My next question is, I have found that VCF files don't get split properly as the header is not included in the second file as is usually required by tools (such as vcf-subset). I have read the code and am happy to implement this functionality but am not to sure where this would best be done.

I see a class Text ( data ) which looks like every datatype is sent to. Would it be best to implement a VCF class which is called when the datatype is VCF?

Cheers,
Ed

VCF is I assume defined as a subclass of Text, so inherits the naive simple splitting implemented for text files (which doesn't know about headers).

Have a look at the SAM splitting code (under lib/galaxy/datatypes/*.py) as an example where header aware splitting was done. You'll probably need to implement something similar.

Peter