-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Let's pretend for a second that I'm rather lazy (oh...wait), and I have ZERO interest in writing datatype parsers to sniff and validate whether or not a specific file is a specific datatype. I'm a sysadmin and bioinformatician, and I've worked with dozens of libraries that exist to parse file formats, and they all die in flames when I feed them bad data.
Would it be possible to somehow define requirements for datatypes?
I don't want to take on the burden of code I write saying "yes, I've sniffed+validated this and it is absolutely a genbank file". That's a lot of responsibility, especially if people have malformed genbank files and their tools fail as a result.
I would like to do this with BioPython and turf the validation to another library that exists to parse genbank files, that will raise and exception if they're invalid.
def sniff(self, filename): from Bio import SeqIO try: self.records = list(SeqIO.parse( filename, "genbank" )) return True except: self.records = None return False
def validate(self, dataset): from Bio import SeqIO errors = list() try: self.records = list(SeqIO.parse( dataset.file_name, "genbank" )) except Exception, e: errors.append(e) return errors
def set_meta(self, dataset, **kwd): if self.records is not None: dataset.metadata.number_of_sequences = len(self.records)
so much easier! And I can shift the burden of validation and sniffing to upstream, rather than any failures being my fault and requiring maintenance of a complex sniffer.
Cheers, Eric
- -- Eric Rasche Programmer II Center for Phage Technology Texas A&M University College Station, TX 77843 404-692-2048 esr@tamu.edu rasche.eric@yandex.ru