-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Let's pretend for a second that I'm rather lazy (oh...wait), and I have
ZERO interest in writing datatype parsers to sniff and validate whether
or not a specific file is a specific datatype. I'm a sysadmin and
bioinformatician, and I've worked with dozens of libraries that exist to
parse file formats, and they all die in flames when I feed them bad data.
Would it be possible to somehow define requirements for datatypes?
I don't want to take on the burden of code I write saying "yes, I've
sniffed+validated this and it is absolutely a genbank file". That's a
lot of responsibility, especially if people have malformed genbank files
and their tools fail as a result.
I would like to do this with BioPython and turf the validation to
another library that exists to parse genbank files, that will raise and
exception if they're invalid.
> def sniff(self, filename):
> from Bio import SeqIO
> try:
> self.records = list(SeqIO.parse( filename, "genbank" ))
> return True
> except:
> self.records = None
> return False
>
> def validate(self, dataset):
> from Bio import SeqIO
> errors = list()
> try:
> self.records = list(SeqIO.parse( dataset.file_name, "genbank" ))
> except Exception, e:
> errors.append(e)
> return errors
>
> def set_meta(self, dataset, **kwd):
> if self.records is not None:
> dataset.metadata.number_of_sequences = len(self.records)
so much easier! And I can shift the burden of validation and sniffing to
upstream, rather than any failures being my fault and requiring
maintenance of a complex sniffer.
Cheers,
Eric
- --
Eric Rasche
Programmer II
Center for Phage Technology
Texas A&M University
College Station, TX 77843
404-692-2048
esr(a)tamu.edu
rasche.eric(a)yandex.ru
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAEBAgAGBQJTyBmyAAoJEMqDXdrsMcpVQa0P/jj0edAKM6QsodhRWHglR92W
tej1tJjtPgtJ15wsFzq6wVfhbL5J39ytsWjjtk//jhVNXh4FEE/OFZe6Nx9uTFKP
ybazyTrLSCrxsST+w+Rx8Q9vfzShr87vjP+fC1k5i2EZOgogPOcQml1ouOHHjC6z
pArrwPOvL3ZxWJG7oEcZjUjrPD8+ffhfQ/x096YYIMw7Hg74d50ARwtawJRoslZD
JnYWa+aUOcsvC3QMrLKkDm4qBaTHa5x7x7P07Lcx7X65iMPDcuMZNtImiLztNscF
QwbbdJdcs8oeSRRnmKgAllRAKf4dMeiyaSI+muVzNlpvLlSMZBNawD0bO1OXmIQH
vAaV0eU+rYmDJSGo330o+RydvlDJENTXOkDt0TxmvfYAPtg2TlJCiWUdL7V1LqqF
n8J5Z7Cu/sqRGSr5ww6KY27QHq6TU1WZDsVZiyEWJeKg3HGzp0MUmzMdr7iSZawK
gnZxv6qg3+FlSqA30niyAuxEq588vS8uEFjjOfhnNLsUM7FAuFANF5z9bPOhG2qM
Xjc3/NY7NsERd9nsIwfRuz0DWni8upvZ39vfeRZ3OAW9NwjRzqXrQiQp08XHa934
z4EBnpcWc9rNSV/3APF/imecBTOoiKtZfzIfILLtOPGE407Bmd8cE8hWyW7ipvrT
QU6DIimj3eoMn+elXDfX
=M+s5
-----END PGP SIGNATURE-----