Hello again, first of all thanks for your help, it is being very useful. What I have done up to now is to copy this method to the class Sequence def get_split_commands_sequential(is_compressed, input_name, output_name, start_sequence, sequence_count): """ Does a brain-dead sequential scan & extract of certain sequences >>> Sequence.get_split_commands_sequential(True, './input.gz', './output.gz', start_sequence=0, sequence_count=10) ['zcat "./input.gz" | ( tail -n +1 2> /dev/null) | head -40 | gzip -c > "./output.gz"'] >>> Sequence.get_split_commands_sequential(False, './input.fastq', './output.fastq', start_sequence=10, sequence_count=10) ['tail -n +41 "./input.fastq" 2> /dev/null | head -40 > "./output.fastq"'] """ start_line = start_sequence * 4 line_count = sequence_count * 4 # TODO: verify that tail can handle 64-bit numbers if is_compressed: cmd = 'zcat "%s" | ( tail -n +%s 2> /dev/null) | head -%s | gzip -c' % (input_name, start_line+1, line_count) else: cmd = 'tail -n +%s "%s" 2> /dev/null | head -%s' % (start_line+1, input_name, line_count) cmd += ' > "%s"' % output_name return [cmd] get_split_commands_sequential = staticmethod(get_split_commands_sequential) This is something that you suggested. When I run the tool with this configuration: <tool id="bwa_mio" name="map with bwa"> <description>map with bwa</description> <parallelism method="basic" split_size="3" split_mode="number_of_parts"></parallelism> <command> bwa mem /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa $input > $output 2>/dev/null</command> <inputs> <param format="fastqsanger" name="input" type="data" label="fastq"/> </inputs> <outputs> <data format="sam" name="output" /> </outputs> <help> bwa </help> </tool> Everything ends ok, but when I go to check how is the sam, I see that in the alingments it is the path of the file, i.e example_split.sam: /home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446 4 * 0 0 * * 0 0 TCTGGGTGAGGGAGTGGGGAGTGGGTTTTTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT ############################################################################ AS:i:0 XS:i:0 you know what may be going on? If i don't split the file, everything goes correctly. Best regards On 13 February 2015 at 13:39, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Fri, Feb 13, 2015 at 11:38 AM, Nicola Soranzo <nsoranzo@tiscali.it> wrote:
Il 13.02.2015 03:17 Peter Cock ha scritto:
Hi Roberto,
It looks like this is a known issue with FASTQ splitting,
https://trello.com/c/qRHLFSzd/1522-issues-with-tasked-jobs-parallelism
I originally broke it during a refactor, but it looks like the discussion died about that that method was meant to do (e.g. FQTOC = FASTQ table of contents?):
https://bitbucket.org/galaxy/galaxy-central/commits/76277761807306ec2be3f1e4...
I'm away from the office so can't try this, but probably all that is needed is to copy and paste the old method get_split_commands_sequential and the old method get_split_commands_with_toc (removed from the base Sequence class in the above commit) into the base Fastq class instead.
Nicola - did you fix this locally after noticing the problem last year?
No, sorry, we disabled Galaxy parallelism because it was using too many cluster nodes.
Nicola
I had similar comments from some of the cluster users after getting it working here - but on balance a well used cluster helps justify future investment in maintaining it.
Sorry about not following up on this - I think I might have assumed you would take care of it. Unfortunately I won't be able to test the obvious fix until at least a week later...
Peter
-- Roberto Alonso Functional Genomics Unit Bioinformatics and Genomics Department Prince Felipe Research Center (CIPF) C./Eduardo Primo Yúfera (Científic), nº 3 (junto Oceanografico) 46012 Valencia, Spain Tel: +34 963289680 Ext. 1021 Fax: +34 963289574 E-Mail: ralonso@cipf.es