question about splitting bams

23 Apr 2015

      Hello,
I ma trying ti write some code in order to give the possibility of
parallelize some tasks. Now, I was with the problem of splitting a bam in
some parts, for this I create this simple tool

<parallelism method="multi" split_size="3" split_mode="number_of_parts"
merge_outputs="output" split_inputs="input" ></parallelism>

  <command>
    java -jar
/home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T
UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I
$input -o $output 2> /dev/null;

  </command>
  <inputs>
    <param format="bam" name="input" type="data" label="bam"/>
  </inputs>
  <outputs>
      <data format="vcf" name="output" />
  </outputs>

But I have one problem, when I execute the tool it goes through this part
of code (I am working in dev branch):

*$galaxy/lib/galaxy/jobs/splitters/multi.py, line 75:*

    for input in parent_job.input_datasets:
        if input.name in split_inputs:
            this_input_files =
job_wrapper.get_input_dataset_fnames(input.dataset)
            if len(this_input_files) > 1:
                log_error = "The input '%s' is composed of multiple files -
splitting is not allowed" % str(input.name)
                log.error(log_error)
                raise Exception(log_error)
            input_datasets.append(input.dataset)

So, it is raising the exception because this_input_files=2, concretely:
['/home/ralonso/galaxy/database/files/000/dataset_171.dat',
'/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'],
I guess that:
*dataset_171.dat*: It is the bam file.
*metadata_13.dat*: It is the bai file.

So, Galaxy can't move on and I don't know which would be the best solution.
Maybe change the *if* to check only non-metadata files? I think I should
use both files in order to create the bam sub-files, but this would be
inside the Bam class, under *binary.py* file.
Could you please guide me before I mess things up?

Thanks so much
-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralonso@cipf.es

Roberto Alonso CIPF

Roberto Alonso CIPF

John Chilton

Roberto Alonso CIPF

tags

participants (2)