I am a pragmatist - I have no problem just splitting the inputs and
skipping the metadata files. I would just convert the error into an
log.info() and warn that the tool cannot use metadata files. If the
underlying tool needs an index it can recreate it instead I think. One
can imagine a more intricate solution that would recreate metadata
files as needed - but that would be a lot of work I think.
Does that make sense?
About BB PR 175 there were some recent discussions about that approach
- I would check out
http://dev.list.galaxyproject.org/Parallelism-using-metadata-td4666763.html.
-John
> ___________________________________________________________
On Thu, Apr 23, 2015 at 11:55 AM, Roberto Alonso CIPF <ralonso@cipf.es> wrote:
> Hello,
> I ma trying ti write some code in order to give the possibility of
> parallelize some tasks. Now, I was with the problem of splitting a bam in
> some parts, for this I create this simple tool
>
> <parallelism method="multi" split_size="3" split_mode="number_of_parts"
> merge_outputs="output" split_inputs="input" ></parallelism>
>
> <command>
> java -jar
> /home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T
> UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I
> $input -o $output 2> /dev/null;
>
> </command>
> <inputs>
> <param format="bam" name="input" type="data" label="bam"/>
> </inputs>
> <outputs>
> <data format="vcf" name="output" />
> </outputs>
>
> But I have one problem, when I execute the tool it goes through this part of
> code (I am working in dev branch):
>
> $galaxy/lib/galaxy/jobs/splitters/multi.py, line 75:
>
> for input in parent_job.input_datasets:
> if input.name in split_inputs:
> this_input_files =
> job_wrapper.get_input_dataset_fnames(input.dataset)
> if len(this_input_files) > 1:
> log_error = "The input '%s' is composed of multiple files -
> splitting is not allowed" % str(input.name)
> log.error(log_error)
> raise Exception(log_error)
> input_datasets.append(input.dataset)
>
> So, it is raising the exception because this_input_files=2, concretely:
> ['/home/ralonso/galaxy/database/files/000/dataset_171.dat',
> '/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'],
> I guess that:
> dataset_171.dat: It is the bam file.
> metadata_13.dat: It is the bai file.
>
> So, Galaxy can't move on and I don't know which would be the best solution.
> Maybe change the if to check only non-metadata files? I think I should use
> both files in order to create the bam sub-files, but this would be inside
> the Bam class, under binary.py file.
> Could you please guide me before I mess things up?
>
> Thanks so much
> --
> Roberto Alonso
> Functional Genomics Unit
> Bioinformatics and Genomics Department
> Prince Felipe Research Center (CIPF)
> C./Eduardo Primo Yúfera (Científic), nº 3
> (junto Oceanografico)
> 46012 Valencia, Spain
> Tel: +34 963289680 Ext. 1021
> Fax: +34 963289574
> E-Mail: ralonso@cipf.es
>
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
> http://galaxyproject.org/search/mailinglists/