Regarding my previous mail I found this thread

http://www.bytebucket.org/galaxy/galaxy-central/pull-request/175/parameter-based-bam-file-parallelization/diff

is it still alive? is it maybe the best choice to do the bam parallelization?

Thanks!

Best regards

On 23 April 2015 at 17:55, Roberto Alonso CIPF <ralonso@cipf.es> wrote:

Hello,
I ma trying ti write some code in order to give the possibility of parallelize some tasks. Now, I was with the problem of splitting a bam in some parts, for this I create this simple tool

<parallelism method="multi" split_size="3" split_mode="number_of_parts" merge_outputs="output" split_inputs="input" ></parallelism>

<command>
java -jar /home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I $input -o $output 2> /dev/null;

</command>
<inputs>
<param format="bam" name="input" type="data" label="bam"/>
</inputs>
<outputs>
<data format="vcf" name="output" />
</outputs>

But I have one problem, when I execute the tool it goes through this part of code (I am working in dev branch):

$galaxy/lib/galaxy/jobs/splitters/multi.py, line 75:

for input in parent_job.input_datasets:
if input.name in split_inputs:
this_input_files = job_wrapper.get_input_dataset_fnames(input.dataset)
if len(this_input_files) > 1:
log_error = "The input '%s' is composed of multiple files - splitting is not allowed" % str(input.name)
log.error(log_error)
raise Exception(log_error)
input_datasets.append(input.dataset)

So, it is raising the exception because this_input_files=2, concretely: ['/home/ralonso/galaxy/database/files/000/dataset_171.dat', '/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'], I guess that:
dataset_171.dat: It is the bam file.
metadata_13.dat: It is the bai file.

So, Galaxy can't move on and I don't know which would be the best solution. Maybe change the if to check only non-metadata files? I think I should use both files in order to create the bam sub-files, but this would be inside the Bam class, under binary.py file.
Could you please guide me before I mess things up?

Thanks so much
--
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralonso@cipf.es

Roberto Alonso

Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)

C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralonso@cipf.es