Regarding my previous mail I found this thread http://www.bytebucket.org/galaxy/galaxy-central/pull-request/175/parameter-b... is it still alive? is it maybe the best choice to do the bam parallelization? Thanks! Best regards On 23 April 2015 at 17:55, Roberto Alonso CIPF <ralonso@cipf.es> wrote:
Hello, I ma trying ti write some code in order to give the possibility of parallelize some tasks. Now, I was with the problem of splitting a bam in some parts, for this I create this simple tool
<parallelism method="multi" split_size="3" split_mode="number_of_parts" merge_outputs="output" split_inputs="input" ></parallelism>
<command> java -jar /home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I $input -o $output 2> /dev/null;
</command> <inputs> <param format="bam" name="input" type="data" label="bam"/> </inputs> <outputs> <data format="vcf" name="output" /> </outputs>
But I have one problem, when I execute the tool it goes through this part of code (I am working in dev branch):
*$galaxy/lib/galaxy/jobs/splitters/multi.py, line 75:*
for input in parent_job.input_datasets: if input.name in split_inputs: this_input_files = job_wrapper.get_input_dataset_fnames(input.dataset) if len(this_input_files) > 1: log_error = "The input '%s' is composed of multiple files - splitting is not allowed" % str(input.name) log.error(log_error) raise Exception(log_error) input_datasets.append(input.dataset)
So, it is raising the exception because this_input_files=2, concretely: ['/home/ralonso/galaxy/database/files/000/dataset_171.dat', '/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'], I guess that: *dataset_171.dat*: It is the bam file. *metadata_13.dat*: It is the bai file.
So, Galaxy can't move on and I don't know which would be the best solution. Maybe change the *if* to check only non-metadata files? I think I should use both files in order to create the bam sub-files, but this would be inside the Bam class, under *binary.py* file. Could you please guide me before I mess things up?
Thanks so much -- Roberto Alonso Functional Genomics Unit Bioinformatics and Genomics Department Prince Felipe Research Center (CIPF) C./Eduardo Primo Yúfera (Científic), nº 3 (junto Oceanografico) 46012 Valencia, Spain Tel: +34 963289680 Ext. 1021 Fax: +34 963289574 E-Mail: ralonso@cipf.es
-- Roberto Alonso Functional Genomics Unit Bioinformatics and Genomics Department Prince Felipe Research Center (CIPF) C./Eduardo Primo Yúfera (Científic), nº 3 (junto Oceanografico) 46012 Valencia, Spain Tel: +34 963289680 Ext. 1021 Fax: +34 963289574 E-Mail: ralonso@cipf.es