Hello Rasmus, The fix for this should be pushed out to our public repo shortly, and available on our main site as well. I've opened the following ticket in bitbucket so you can "follow" it if you want. http://bitbucket.org/galaxy/galaxy-central/issue/112/fix-fasta_to_tabularpy-... Greg Von Kuster Galaxy Development Team Rasmus Ory Nielsen wrote:
Hi Greg,
I was in the middle of writing a mail with a message very similar to what Brad Chapman just sent. Therefore I will just send my time comparisons to back up my initial mail.
At the moment it is not impossible, but at least you got to have lots of time, if you want to convert a few large sequences.
Below is two tests I just ran. Both tests convert a single sequence comparing the original and the patched version (+= approach) of fasta_to_tabular.py.
Thanks.
Best regards, Rasmus Ory Nielsen
------------------------------------------------------------
[roni@galaxy]$ ls -lh test.fa -rw-rw-r-- 1 roni roni 5.9M 2009-07-20 15:24 test.fa [roni@galaxy]$ time ./fasta_to_tabular.py test.fa test.tab 0
real 0m0.214s user 0m0.139s sys 0m0.024s [roni@galaxy]$ time ./fasta_to_tabular.py.orig test.fa test.tab.orig 0
real 2m37.114s user 1m53.467s sys 0m43.531s
And with a bigger file:
[roni@galaxy]$ ls -lh test2.fa -rw-rw-r-- 1 roni roni 12M 2009-07-20 15:33 test2.fa [roni@galaxy]$ time ./fasta_to_tabular.py test2.fa test2.tab 0
real 0m0.413s user 0m0.264s sys 0m0.050s [roni@galaxy]$ time ./fasta_to_tabular.py.orig test2.fa test2.tab.orig 0
real 13m30.621s user 9m18.316s sys 4m12.081s
________________________________________ Fra: Greg Von Kuster [ghv2@psu.edu] Sendt: 20. juli 2009 14:44 Til: Bob Harris Cc: galaxy-user@bx.psu.edu; Rasmus Ory Nielsen Emne: Re: [galaxy-user] fasta_to_tabular.py slowness
Think about memory when you have large files...
Bob Harris wrote:
I suspect an additional improvement would be seen by keeping fasta_seq as a list of strings, using fasta_seq.append(line), and the catenating them together with "".join when it's time to output.
Mind you, I haven't tested that though.
Bob H
On Jul 18, 2009, at 3:10 PM, Rasmus Ory Nielsen wrote:
Hi Galaxy Team,
I've found that fasta_to_tabular.py is very slow with big sequences, e.g. ~4 minutes for a single 5MB sequence.
The patch below makes the running time go from minutes to seconds for such a sequence. Mind you, this is my first line of python, so there may be a smarter way.
Best regards, Rasmus Ory Nielsen
--- fasta_to_tabular.py.orig 2009-07-18 16:25:50.896487000 +0200 +++ fasta_to_tabular.py 2009-07-18 17:22:49.544611000 +0200 @@ -34,7 +34,7 @@ fasta_seq = '' else: if line: - fasta_seq = "%s%s" % ( fasta_seq, line ) + fasta_seq += line
if fasta_seq: out.write( "%s\t%s\n" %( fasta_title[ 1:keep_first ], fasta_seq ) ) _______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user