Assaf Gordon wrote:
Greg Von Kuster wrote, On 01/08/2010 01:02 PM:
A dataset's set_meta() is done as part of the job, so if you are not running jobs on a cluster, set_meta() will be run locally as well, which is certainly chewing up cpu on your server.
I don't mind it running locally, I have several CPUs to spare - the problem is that it seems to be running in a thread inside the main galaxy process - which slows all of galaxy. If there's a way to have set_meta be called externally with local runner (as another local job - with a different process) - this would also solve the issue (I think).
Even if using the local runner, set_metadata_externally will cause the metadata code to run in a separate process, which (python-wise) would be a huge help for performance.
If running externally, set_meta() will run on the cluster when the user does anything in the "Edit Attributes" page that call set_meta(), including "Auto-detect".
This is interesting, but how does galaxy know to submit an "Edit Attributes" job to the cluster? does it do "qsub" with the default runner? I'm asking because even when/if I switch to use the cluster, the default runner will still be local, and only some specific jobs will have an "sge://" runner. How would then galaxy know to submit a job to the SGE cluster?
It gets a tool id, '__SET_METADATA__', and is submitted through the regular job runner. I just tested and you can set it in universe_wsgi.ini as you would any other job runner override.
As soon as I get a chance, I'll look at enhancing set_meta() to check if "set_metadata_externally" is True for those data types that take significant processing, and if jobs are running locally, metadata will be set differently.
I'll be more than happy to beta-test this feature. let me know if I can assist.
This is already implemented since auto-detect is run as a job. --nate
Thanks for all your help! -gordon
On Jan 8, 2010, at 12:25 PM, Assaf Gordon wrote:
It is set to "False", but my galaxy runs jobs locally, not on a cluster... (at least, not directly through the SGE Runner).
Does this work with local-runner too (i.e. starting a new process to set the metadata) ? Also, does the "external" method works when the use changes the type in the "Edit Attributes" page ?
Greg Von Kuster wrote, On 01/08/2010 10:54 AM:
Hello Assaf,
Is your instance configured to set metadata externally ( on your cluster nodes )? If not, in your universe_wsgi.ini file, add the following to the [app:main] section:
set_metadata_externally = True
On Jan 6, 2010, at 5:13 PM, Assaf Gordon wrote:
Hello all,
Continuing the search for slowness in my local Galaxy server (see http://lists.bx.psu.edu/pipermail/galaxy-dev/2009-December/001549.html ),
The datatypes/sequence.py file is also scanning and parsing entire files when creating a new FASTA/FASTQ file. It's nice and fun and informative for small files, but with a 2.7GB FASTA file - the python process stays at 100% CPU for a long long time, causing everything else to be very slow.
The offending code is at sequence.py, method "set_meta", lines 30-39.
I think Illumina expects 25x coverage of the human genome in a single run by the end of the year - this will roughly translates to 8 FASTQ files of more than 8GB each => FASTA files of 4GB each... Galaxy will not be able to just casually scan these files.
-gordon
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu <mailto:galaxy-dev@lists.bx.psu.edu> http://lists.bx.psu.edu/listinfo/galaxy-dev Greg Von Kuster Galaxy Development Team greg@bx.psu.edu <mailto:greg@bx.psu.edu>
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu <mailto:greg@bx.psu.edu>
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev