Hi,
I'm looking for a preferred way of running Bowtie (or any other tool) on multiple input files and run statistics on the Bowtie output afterwards.
The input is a directory of files fastq1..fastq100
The bowtie output should be bed1...bed100
The statistics tool should run on bed1...bed100 and return xls1..xls100
Then I will write a tool which will get xls1..xls100 and merge them to one final output.
I searched for a smiliar cases, and I couldn't figure anyone which had this problem before.
Can't use the parallelism tag, because what will be the input for each tool? it should be a fastq file not a directory of fastq files.
Neither I would like to run each fastq file in a different workflow - creating a mess.
I thought only on two solutions:
1. Implement new datatypes: bed_dir & fastq_dir and implements new tool wrappers which will get a folder instead of a file.
2. merge the input files before sending to bowtie, and use parallelism tag to make them be splitted & merged again on each tool.
Does anyone has any better suggestion?
Thanks,
Hagai