Re: [galaxy-user] fasta_to_tabular.py slowness

20 Jul 2009

Think about memory when you have large files...


Bob Harris wrote:
...
I suspect an additional improvement would be seen by keeping fasta_seq  
as a list of strings, using fasta_seq.append(line), and the catenating  
them together with "".join when it's time to output.
Mind you, I haven't tested that though.
Bob H
On Jul 18, 2009, at 3:10 PM, Rasmus Ory Nielsen wrote:
...
Hi Galaxy Team,
I've found that fasta_to_tabular.py is very slow with big sequences,  
e.g. ~4 minutes for a single 5MB sequence.
The patch below makes the running time go from minutes to seconds  
for such a sequence. Mind you, this is my first line of python, so  
there may be a smarter way.
Best regards,
Rasmus Ory Nielsen

--- fasta_to_tabular.py.orig	2009-07-18 16:25:50.896487000 +0200
+++ fasta_to_tabular.py	2009-07-18 17:22:49.544611000 +0200
@@ -34,7 +34,7 @@
            fasta_seq = ''
        else:
            if line:
-                fasta_seq = "%s%s" % ( fasta_seq, line )
+                fasta_seq += line
if fasta_seq:
        out.write( "%s\t%s\n" %( fasta_title[ 1:keep_first ],  
fasta_seq ) )
_______________________________________________
galaxy-user mailing list
galaxy-user@bx.psu.edu
http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________
galaxy-user mailing list
galaxy-user@bx.psu.edu
http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user

    

Re: [galaxy-user] fasta_to_tabular.py slowness

Greg Von Kuster