Assaf Gordon wrote:
Greg Von Kuster wrote, On 01/08/2010 01:02 PM:
> A dataset's set_meta() is done as part of the job, so if you are not
> running jobs on a cluster, set_meta() will be run locally as well, which
> is certainly chewing up cpu on your server.
I don't mind it running locally, I have several CPUs to spare - the problem is that
it seems to be running in a thread inside the main galaxy process - which slows all of
If there's a way to have set_meta be called externally with local runner (as another
local job - with a different process) - this would also solve the issue (I think).
Even if using the local runner, set_metadata_externally will cause the
metadata code to run in a separate process, which (python-wise) would be
a huge help for performance.
> If running externally,
> set_meta() will run on the cluster when the user does anything in the
> "Edit Attributes" page that call set_meta(), including
This is interesting, but how does galaxy know to submit an "Edit Attributes"
job to the cluster? does it do "qsub" with the default runner?
I'm asking because even when/if I switch to use the cluster, the default runner will
still be local, and only some specific jobs will have an "sge://" runner. How
would then galaxy know to submit a job to the SGE cluster?
It gets a tool id, '__SET_METADATA__', and is submitted through the
regular job runner. I just tested and you can set it in
universe_wsgi.ini as you would any other job runner override.
> As soon as I get a chance, I'll look at enhancing set_meta()
to check if
> "set_metadata_externally" is True for those data types that take
> significant processing, and if jobs are running locally, metadata will
> be set differently.
I'll be more than happy to beta-test this feature. let me know if I can assist.
This is already implemented since auto-detect is run as a job.
Thanks for all your help!
> On Jan 8, 2010, at 12:25 PM, Assaf Gordon wrote:
>> It is set to "False", but my galaxy runs jobs locally, not on a
>> (at least, not directly through the SGE Runner).
>> Does this work with local-runner too (i.e. starting a new process to
>> set the metadata) ?
>> Also, does the "external" method works when the use changes the type
>> in the "Edit Attributes" page ?
>> Greg Von Kuster wrote, On 01/08/2010 10:54 AM:
>>> Hello Assaf,
>>> Is your instance configured to set metadata externally ( on your cluster
>>> nodes )? If not, in your universe_wsgi.ini file, add the following to
>>> the [app:main] section:
>>> set_metadata_externally = True
>>> On Jan 6, 2010, at 5:13 PM, Assaf Gordon wrote:
>>>> Hello all,
>>>> Continuing the search for slowness in my local Galaxy server (see
>>>> The datatypes/sequence.py file is also scanning and parsing entire
>>>> files when creating a new FASTA/FASTQ file.
>>>> It's nice and fun and informative for small files, but with a 2.7GB
>>>> FASTA file - the python process stays at 100% CPU for a long long
>>>> time, causing everything else to be very slow.
>>>> The offending code is at sequence.py, method "set_meta", lines
>>>> I think Illumina expects 25x coverage of the human genome in a single
>>>> run by the end of the year - this will roughly translates to 8 FASTQ
>>>> files of more than 8GB each => FASTA files of 4GB each... Galaxy will
>>>> not be able to just casually scan these files.
>>>> galaxy-dev mailing list
>>>> galaxy-dev(a)lists.bx.psu.edu <mailto:email@example.com>
>>> Greg Von Kuster
>>> Galaxy Development Team
>>> greg(a)bx.psu.edu <mailto:firstname.lastname@example.org>
> Greg Von Kuster
> Galaxy Development Team
> greg(a)bx.psu.edu <mailto:email@example.com>
galaxy-dev mailing list