Hello Nate, Greg,
Thanks for your help, with external set_meta everything works much better.
The galaxy process is down to 23% CPU because of some SQLAlchemy thing, but that's for
another time.
One tiny issue:
If a user goes to "Edit Attributes" and changes the file type directly (not with
Auto-Detect) - the set_meta is still executed inside the galaxy process as a thread.
While this shouldn't happen so often, it does happen sometimes in two cases:
1. When users click it by mistakes (it does happen)
2. When there's a need to switch between txt/tabular/interval files.
Thanks again,
-gordon
Nate Coraor wrote, On 01/08/2010 01:39 PM:
Assaf Gordon wrote:
> Greg Von Kuster wrote, On 01/08/2010 01:02 PM:
>> A dataset's set_meta() is done as part of the job, so if you are not
>> running jobs on a cluster, set_meta() will be run locally as well, which
>> is certainly chewing up cpu on your server.
>
> I don't mind it running locally, I have several CPUs to spare - the
> problem is that it seems to be running in a thread inside the main
> galaxy process - which slows all of galaxy.
> If there's a way to have set_meta be called externally with local
> runner (as another local job - with a different process) - this would
> also solve the issue (I think).
Even if using the local runner, set_metadata_externally will cause the
metadata code to run in a separate process, which (python-wise) would be
a huge help for performance.
>> If running externally,
>> set_meta() will run on the cluster when the user does anything in the
>> "Edit Attributes" page that call set_meta(), including
"Auto-detect".
>
> This is interesting, but how does galaxy know to submit an "Edit
> Attributes" job to the cluster? does it do "qsub" with the default
> runner?
> I'm asking because even when/if I switch to use the cluster, the
> default runner will still be local, and only some specific jobs will
> have an "sge://" runner. How would then galaxy know to submit a job to
> the SGE cluster?
It gets a tool id, '__SET_METADATA__', and is submitted through the
regular job runner. I just tested and you can set it in
universe_wsgi.ini as you would any other job runner override.
>> As soon as I get a chance, I'll look at enhancing set_meta() to check if
>> "set_metadata_externally" is True for those data types that take
>> significant processing, and if jobs are running locally, metadata will
>> be set differently.
>
> I'll be more than happy to beta-test this feature. let me know if I
> can assist.
This is already implemented since auto-detect is run as a job.
--nate
>
>
> Thanks for all your help!
> -gordon
>
>
>>
>> On Jan 8, 2010, at 12:25 PM, Assaf Gordon wrote:
>>
>>> It is set to "False", but my galaxy runs jobs locally, not on a
>>> cluster...
>>> (at least, not directly through the SGE Runner).
>>>
>>> Does this work with local-runner too (i.e. starting a new process to
>>> set the metadata) ?
>>> Also, does the "external" method works when the use changes the
type
>>> in the "Edit Attributes" page ?
>>>
>>>
>>>
>>> Greg Von Kuster wrote, On 01/08/2010 10:54 AM:
>>>> Hello Assaf,
>>>>
>>>> Is your instance configured to set metadata externally ( on your
>>>> cluster
>>>> nodes )? If not, in your universe_wsgi.ini file, add the following to
>>>> the [app:main] section:
>>>>
>>>> set_metadata_externally = True
>>>>
>>>>
>>>> On Jan 6, 2010, at 5:13 PM, Assaf Gordon wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> Continuing the search for slowness in my local Galaxy server (see
>>>>>
http://lists.bx.psu.edu/pipermail/galaxy-dev/2009-December/001549.html
>>>>>
>>>>> ),
>>>>>
>>>>> The datatypes/sequence.py file is also scanning and parsing entire
>>>>> files when creating a new FASTA/FASTQ file.
>>>>> It's nice and fun and informative for small files, but with a
2.7GB
>>>>> FASTA file - the python process stays at 100% CPU for a long long
>>>>> time, causing everything else to be very slow.
>>>>>
>>>>> The offending code is at sequence.py, method "set_meta",
lines 30-39.
>>>>>
>>>>> I think Illumina expects 25x coverage of the human genome in a
single
>>>>> run by the end of the year - this will roughly translates to 8 FASTQ
>>>>> files of more than 8GB each => FASTA files of 4GB each... Galaxy
will
>>>>> not be able to just casually scan these files.
>>>>>
>>>>> -gordon
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> galaxy-dev mailing list
>>>>> galaxy-dev(a)lists.bx.psu.edu
<mailto:galaxy-dev@lists.bx.psu.edu>
>>>>>
http://lists.bx.psu.edu/listinfo/galaxy-dev
>>>> Greg Von Kuster
>>>> Galaxy Development Team
>>>> greg(a)bx.psu.edu <mailto:greg@bx.psu.edu>
>>>>
>>>>
>>>>
>> Greg Von Kuster
>> Galaxy Development Team
>> greg(a)bx.psu.edu <mailto:greg@bx.psu.edu>
>>
>>
>>
>
> _______________________________________________
> galaxy-dev mailing list
> galaxy-dev(a)lists.bx.psu.edu
>
http://lists.bx.psu.edu/listinfo/galaxy-dev