Hello,

I have been reading those different threads and I have some doubts that you maybe can clarify me. In the thread you said: "ability to write tools that split up a single input into a collection. ", I think this is focused for workflows, but in any case, could we use this to split bams?

Another comment is the next:

"These common pipelines where you split up a BAM files, run a bunch of

steps, and then merge the results will be executable in the near

future (though 15.03 won't have workflow editor support for it - I

will try to get to this by the following release - and you can

manually build up workflows to do this - "

As I was trying to write something that will do exactly this and I guess there is someone working on this, do you think is it worth to continue doing this or just switch to another thing? would you know the road-map of this feature?

Thanks a lot,

Roberto

On 23 April 2015 at 20:09, John Chilton <jmchilton@gmail.com> wrote:

I am a pragmatist - I have no problem just splitting the inputs and
skipping the metadata files. I would just convert the error into an
log.info() and warn that the tool cannot use metadata files. If the
underlying tool needs an index it can recreate it instead I think. One
can imagine a more intricate solution that would recreate metadata
files as needed - but that would be a lot of work I think.

Does that make sense?

About BB PR 175 there were some recent discussions about that approach
- I would check out
http://dev.list.galaxyproject.org/Parallelism-using-metadata-td4666763.html.

-John

On Thu, Apr 23, 2015 at 11:55 AM, Roberto Alonso CIPF <ralonso@cipf.es> wrote:
> Hello,
> I ma trying ti write some code in order to give the possibility of
> parallelize some tasks. Now, I was with the problem of splitting a bam in
> some parts, for this I create this simple tool
>
> <parallelism method="multi" split_size="3" split_mode="number_of_parts"
> merge_outputs="output" split_inputs="input" ></parallelism>
>
> <command>
> java -jar
> /home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T
> UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I
> $input -o $output 2> /dev/null;
>
> </command>
> <inputs>
> <param format="bam" name="input" type="data" label="bam"/>
> </inputs>
> <outputs>
> <data format="vcf" name="output" />
> </outputs>
>
> But I have one problem, when I execute the tool it goes through this part of
> code (I am working in dev branch):
>
> $galaxy/lib/galaxy/jobs/splitters/multi.py, line 75:
>
> for input in parent_job.input_datasets:
> if input.name in split_inputs:
> this_input_files =
> job_wrapper.get_input_dataset_fnames(input.dataset)
> if len(this_input_files) > 1:
> log_error = "The input '%s' is composed of multiple files -
> splitting is not allowed" % str(input.name)
> log.error(log_error)
> raise Exception(log_error)
> input_datasets.append(input.dataset)
>
> So, it is raising the exception because this_input_files=2, concretely:
> ['/home/ralonso/galaxy/database/files/000/dataset_171.dat',
> '/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'],
> I guess that:
> dataset_171.dat: It is the bam file.
> metadata_13.dat: It is the bai file.
>
> So, Galaxy can't move on and I don't know which would be the best solution.
> Maybe change the if to check only non-metadata files? I think I should use
> both files in order to create the bam sub-files, but this would be inside
> the Bam class, under binary.py file.
> Could you please guide me before I mess things up?
>
> Thanks so much
> --
> Roberto Alonso
> Functional Genomics Unit
> Bioinformatics and Genomics Department
> Prince Felipe Research Center (CIPF)
> C./Eduardo Primo Yúfera (Científic), nº 3
> (junto Oceanografico)
> 46012 Valencia, Spain
> Tel: +34 963289680 Ext. 1021
> Fax: +34 963289574
> E-Mail: ralonso@cipf.es
>

> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
> http://galaxyproject.org/search/mailinglists/

Roberto Alonso

Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)

C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralonso@cipf.es