Re: [galaxy-dev] problems splitting

25 Feb 2015


      Hello again :),

I have found the problem, the code that merge the files is this:
galaxy/datatypes/tabular.py:484:            cmd = 'egrep -v "^@" %s >> %s'
% ( ' '.join(split_files[1:]), output_file )
This concatenates the file name into the sam file. Just adding "h" it is
enough, so it will be like this:
galaxy/datatypes/tabular.py:484:            cmd = 'egrep -*h*v "^@" %s >>
%s' % ( ' '.join(split_files[1:]), output_file )

Thanks all for your help, best regards


On 25 February 2015 at 12:31, Roberto Alonso CIPF <ralonso@cipf.es> wrote:
...
Ok, I think I understand the line:
beginning merge: bwa mem
/home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
/home/ralonso/galaxy-dist/database/files/000/dataset_8.dat >
/home/ralonso/galaxy-dist/database/files/000/dataset_94.dat 2> /dev/null
it refers to the original command, so everything is fine with this line.
The other problem still remains
Regards, sorry for the confusion
On 25 February 2015 at 11:40, Roberto Alonso CIPF <ralonso@cipf.es> wrote:
...
Hello again,
this is something that I consider important, when I see the log I see
this output:
galaxy.jobs.runners.tasks DEBUG 2015-02-25 11:33:30,989 execution
finished -* beginning merge: bwa mem*
/home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
/home/ralonso/galaxy-dist/database/files/000/dataset_8.dat >
/home/ralonso/galaxy-dist/database/files/000/dataset_94.dat 2> /dev/null
I think the merge should be done with samtools. I don't know how is this
programmed in Galaxy, but I didn't indicate anywhere the path to samtools,
is it maybe the problem related with this?
Thanks a lot,
Regards
On 25 February 2015 at 11:13, Roberto Alonso CIPF <ralonso@cipf.es>
wrote:
...
Hello,
I just changed for the CDATA format, but the problem still remains. When
I split by 2, there is no problem, but when I go for 3, it happens the
problem commented before. Here it is the link to the sam/bam file:
 https://dl.dropboxusercontent.com/u/1669701/ejemplo_split.bam
Best regards
On 24 February 2015 at 17:49, Peter Cock <p.j.a.cock@googlemail.com>
wrote:
...
On Tue, Feb 24, 2015 at 4:43 PM, Roberto Alonso CIPF <ralonso@cipf.es>
wrote:
...
Hello again,
first of all thanks for your help, it is being very useful.
What I have done up to now is to copy this method to the class
Sequence
def get_split_commands_sequential(is_compressed, input_name,
output_name,
start_sequence, sequence_count):
        ...
        return [cmd]
    get_split_commands_sequential =
staticmethod(get_split_commands_sequential)
This is something that you suggested.
Good.
...
When I run the tool with this configuration:
<tool id="bwa_mio" name="map with bwa">
  <description>map with bwa</description>
  <parallelism method="basic" split_size="3"
split_mode="number_of_parts"></parallelism>
<command>
      bwa mem
/home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
$input > $output 2>/dev/null</command>
  <inputs>
    <param format="fastqsanger" name="input" type="data"
label="fastq"/>
  </inputs>
  <outputs>
      <data format="sam" name="output" />
  </outputs>
<help>
  bwa
  </help>
</tool>
One minor improvement would be to escape the ">" as ">" in
your XML, or use the CDATA approach documented here:
https://wiki.galaxyproject.org/Tools/BestPractices
...
Everything ends ok, but when I go to check how is the sam, I see that
in the
alingments it is the path of the file, i.e
example_split.sam:
/home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446
...
4 * 0 0 * * 0 0
TCTGGGTGAGGGAGTGGGGAGTGGGTTTTTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT
...
############################################################################
...
AS:i:0 XS:i:0
you know what  may be going on?
If i don't split the file, everything goes correctly.
This sounds to me like there may be a problem with SAM merging?
Could you share the entire example_split.sam file (e.g. as a gist
on GitHub, or via dropbox)?
Peter
--
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralonso@cipf.es
--
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralonso@cipf.es
--
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralonso@cipf.es
-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralonso@cipf.es