I'm currently in the process of loading (path paste) a large library of BAM
files (>10000) into the shared Data Libraries of our local galaxy
installation, but I'm finding this process to be very slow.
I'm doing a path paste, and not actually copying the files. I have disabled
local running of 'upload1', so that it will run on the cluster, and set
'set_metadata_externally' to true.
It looks like the job handlers are calling 'samtools index' directly.
Looking through the code, that seems to happen in galaxy/datatypes/binary
in Bam.dataset_content_needs_grooming, where it calls 'samtools index' and
What would be the most efficient way to start changing the code so that
this process can be done by an external script, at a deferred time out on