Very cool, I'll check it out! The addition of the JSON files is indeed very new and was likely unfinished with respect to the base splitter. -Dannon On Feb 16, 2012, at 1:24 PM, Peter Cock wrote:
On Thu, Feb 16, 2012 at 4:28 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hi Dan,
I think I need a little more advice - what is the role of the script scripts/extract_dataset_part.py and the JSON files created when splitting FASTQ files in lib/galaxy/datatypes/sequence.py, and then used by the class' process_split_file method?
Why is there no JSON file created by the base data class in lib/galaxy/datatypes/data.py and no method process_split_file?
Is the JSON thing part of a partial and unfinished rewrite of the splitter code?
On the assumption that not all splitters bother with the JSON, I am trying a little hack to scripts/extract_dataset_part.py to abort silently if there is no JSON file: https://bitbucket.org/peterjc/galaxy-central/changeset/ebe94a2c25c3
This seems to be working with my current attempt at a FASTA splitter (not checked in yes, only partly implemented and tested).
I've checked in my FASTA splitting, which now seems to be working OK with my BLAST tests. So far this only does splitting into chunks of the requested number of sequences, rather than the option to split the whole file into a given number of pieces. https://bitbucket.org/peterjc/galaxy-central/changeset/416c961c0da9
I also need to look at merging multiple BLAST XML outputs, but this is looking promising.
Peter