Think about memory when you have large files... Bob Harris wrote:
I suspect an additional improvement would be seen by keeping fasta_seq as a list of strings, using fasta_seq.append(line), and the catenating them together with "".join when it's time to output.
Mind you, I haven't tested that though.
Bob H
On Jul 18, 2009, at 3:10 PM, Rasmus Ory Nielsen wrote:
Hi Galaxy Team,
I've found that fasta_to_tabular.py is very slow with big sequences, e.g. ~4 minutes for a single 5MB sequence.
The patch below makes the running time go from minutes to seconds for such a sequence. Mind you, this is my first line of python, so there may be a smarter way.
Best regards, Rasmus Ory Nielsen
--- fasta_to_tabular.py.orig 2009-07-18 16:25:50.896487000 +0200 +++ fasta_to_tabular.py 2009-07-18 17:22:49.544611000 +0200 @@ -34,7 +34,7 @@ fasta_seq = '' else: if line: - fasta_seq = "%s%s" % ( fasta_seq, line ) + fasta_seq += line
if fasta_seq: out.write( "%s\t%s\n" %( fasta_title[ 1:keep_first ], fasta_seq ) ) _______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user