On Thu, Jul 17, 2014 at 8:20 PM, Eric Rasche <rasche.eric@yandex.ru> wrote:
On 07/17/2014 02:11 PM, Peter Cock wrote:
You could do something like that, and we already have Biopython packages in the ToolShed which can be listed as dependencies :)
If my module depends on the biopython from the toolshed, will that be accessible within a datatype? Would it be as simple as "from Bio import X"? Most of what I've seen of dependencies (and please forgive my lack of knowledge about them) consists of env.sh being sourced with paths to binaries, prior to tool run.
I don't know - this may well be a gap in the ToolShed framework, since thus far most of the datatypes defined have been self contained. I have asked something similar before (in the context of defining automatic file format conversion like the way Galaxy can turn FASTA into tabular in input parameters expecting tabular), where there could be a binary dependency.
However, some things like GenBank are tricky - in order to tolerate NCBI dumps the Biopython parser will ignore any free text before the first LOCUS line. A confusing side effect is most text files are then treated as a GenBank file with zero records. But if it came back with some records it is probably OK :)
Interesting, very good to know.
Basically Biopython also does not care to offer file format detection simply because it is a can of worms.
Zen of Python - explicit is better than implicit.
We want you to tell us which format you want to try parsing it as.
Yes! Exactly! Which is why it's perfectly fine here:
SeqIO.parse( dataset.file_name, "genbank" )
All I want to know is whether or not this parses as a genbank file (and has 1 or more records). BioPython may not do automatic format detection (yuck, agreed), but since I already know I'm looking for a genbank file, simply being able to parse it or not is "good enough".
With those provisos, you should be OK :) Peter