New subject: Preffered way of running a tool on multiple input files

12 Feb 2013

      Hi,
I'm looking for a preferred way of running Bowtie (or any other tool) on
multiple input files and run statistics on the Bowtie output afterwards.

The input is a directory of files fastq1..fastq100
The bowtie output should be bed1...bed100
The statistics tool should run on bed1...bed100 and return xls1..xls100
Then I will write a tool which will get xls1..xls100 and merge them to one
final output.

I searched for a smiliar cases, and I couldn't figure anyone which had this
problem before.
Can't use the parallelism tag, because what will be the input for each
tool? it should be a fastq file not a directory of fastq files.
Neither I would like to run each fastq file in a different workflow -
creating a mess.

I thought only on two solutions:
1. Implement new datatypes: bed_dir & fastq_dir and implements new tool
wrappers which will get a folder instead of a file.
2. merge the input files before sending to bowtie, and use parallelism tag
to make them be splitted & merged again on each tool.

Does anyone has any better suggestion?

Thanks,
Hagai

Preffered way of running a tool on multiple input files

Hagai Cohen

Joachim Jacob |VIB|

Hagai Cohen

Joachim Jacob |VIB|

Hagai Cohen

Hagai Cohen

John Chilton

tags

participants (3)