Hi Peter, thanks again. Turns out that it has been implemented by the looks of it in lib/galaxy/datatypes/tabular.py under class Vcf. However, despite this, it is always the Text class in data.py that is loaded and not the proper Vcf one. Can you point me in the direction of where the type is chosen? Cheers, Ed On Wed, Oct 31, 2012 at 9:46 PM, Peter Cock <p.j.a.cock@googlemail.com>wrote:
On Wednesday, October 31, 2012, Edward Hills wrote:
Thanks Peter.
My next question is, I have found that VCF files don't get split properly as the header is not included in the second file as is usually required by tools (such as vcf-subset). I have read the code and am happy to implement this functionality but am not to sure where this would best be done.
I see a class Text ( data ) which looks like every datatype is sent to. Would it be best to implement a VCF class which is called when the datatype is VCF?
Cheers, Ed
VCF is I assume defined as a subclass of Text, so inherits the naive simple splitting implemented for text files (which doesn't know about headers).
Have a look at the SAM splitting code (under lib/galaxy/datatypes/*.py) as an example where header aware splitting was done. You'll probably need to implement something similar.
Peter