[galaxy-dev] random_lines_two_pass.py slow

17 Oct 2013

      Hi,
my users start to take comparability serious and start to downsample.
But it seems like the random_lines_two_pass.py tool is very slow with large input files, like bed files with 40million reads to e.g. 33million reads

https://bitbucket.org/galaxy/galaxy-dist/src/2469c53051ea/tools/filters/rand...

I don't understand the rationale behind the deletion of the positions from the array, in most programming languages deletion from an array is slow.

Benchmarking the two random sampling methods was too difficult for me, I removed the get_random_by_subtraction method,
and my users are happy.
Did anybody really benchmark this?

thank you very much,
ido

[galaxy-dev] random_lines_two_pass.py slow

Ido Tamir