Fwd: Galaxy process

newer
Running a workflow programatically

SHAUN WEBB

10 Mar 2011 10 Mar '11

10:21 a.m.

Further to my email below. I have a data library that contains many ~117Gb of NGS data, uploaded via file system path. This library was always slow to open (about 10s) but now takes several minutes or not at all. Thanks for any help on this. I am still experiencing a memory leak that I can't pinpoint and it is only dissipated by restarting the server. At the moment my debugging level is set to INFO. Is there anything I can change in the universe file to try to trace this. Thanks! Shaun ----- Forwarded message from swebb1@staffmail.ed.ac.uk ----- Date: Wed, 09 Mar 2011 10:15:13 +0000 From: SHAUN WEBB <swebb1@staffmail.ed.ac.uk> Subject: Galaxy process To: galaxy dev <galaxy-dev@bx.psu.edu> Hi, since making the last update I have found some new warnings in my paster.log, it also seems as though the galaxy process starts to gather memory and eventually hang (35% of 64G memory). I've posted the entries below. If anyone could help me understand what is going on that would be great. Thanks. Shaun Webb paste.httpserver.ThreadPool INFO 2011-03-09 09:49:49,962 No idle tasks, and only 0 busy tasks; adding 5 more workers paste.httpserver.ThreadPool INFO 2011-03-09 09:49:58,754 No idle tasks, and only 4 busy tasks; adding 1 more workers paste.httpserver.ThreadPool INFO 2011-03-09 09:51:47,301 Culling 6 extra workers (5 idle workers present) paste.httpserver.ThreadPool INFO 2011-03-09 09:55:17,163 No idle tasks, and only 0 busy tasks; adding 5 more workers 129.215.14.72 - - [09/Mar/2011:09:48:40 +0100] "GET /history HTTP/1.1" 500 - "http://bifx-core.bio.ed.ac.uk:8080/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)" paste.httpserver.ThreadPool INFO 2011-03-09 10:13:16,956 Culling 5 extra workers (7 idle workers present) 212.183.140.59 - - [09/Mar/2011:10:13:17 +0100] "GET / HTTP/1.1" 200 - "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.102 Safari/534.13" paste.httpserver.ThreadPool INFO 2011-03-09 10:14:35,715 No idle tasks, and only 2 busy tasks; adding 3 more workers paste.httpserver.ThreadPool WARNING 2011-03-09 10:15:15,104 Thread 140283224094464 hung (working on task for 3096 seconds) ---------------------------------------- Exception happened during processing of request from ('212.183.140.59', 10871) Traceback (most recent call last): File "/usr/lib/python2.6/SocketServer.py", line 281, in _handle_request_noblock self.process_request(request, client_address) File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1037, in process_request lambda: self.process_request_in_thread(request, client_address)) File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 617, in add_task self.kill_hung_threads() File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 778, in kill_hung_threads self.kill_worker(worker.thread_id) File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 705, in kill_worker killthread.async_raise(thread_id, SystemExit) File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/util/killthread.py", line 22, in async_raise raise ValueError("invalid thread id") ValueError: invalid thread id ---------------------------------------- -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ----- End forwarded message ----- -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Show replies by date

Greg Von Kuster

10 Mar 10 Mar

2:25 p.m.

Hi Shaun, Security checks are performed on every active ( undeleted ) dataset within a data library as it's contents are rendered upon opening. For the current user, every dataset is checked to determine if the user can access it, and if so, then checks are made to see if the user has permission to perform operations on the dataset that fall into the add / modify / manage permissions areas. These checks incur db hits for each dataset, so if your data library include many datasets ( several hundred or more ), then it will take a bit of time to render it upon opening. The size of the dataset files is not an issue here, but the number of datasets within the data library. A solution to this is to split up the data library. Change set 5200:ed7b6180b925 added the ability to move data library items within a library or between libraries, providing a way to split up a large library. This change set should make it to the distribution within the next few weeks, or you can pull it from our development repo. Thanks Shaun, Greg Von Kuster On Mar 10, 2011, at 5:21 AM, SHAUN WEBB wrote:

...

Further to my email below.

I have a data library that contains many ~117Gb of NGS data, uploaded via file system path. This library was always slow to open (about 10s) but now takes several minutes or not at all.

Thanks for any help on this. I am still experiencing a memory leak that I can't pinpoint and it is only dissipated by restarting the server. At the moment my debugging level is set to INFO. Is there anything I can change in the universe file to try to trace this.

Thanks! Shaun

Greg Von Kuster Galaxy Development Team greg@bx.psu.edu

SHAUN WEBB

3:05 p.m.

Thanks Greg, there would be about 100 datasets in the library. Can you tell me how I would pull a single changeset, I have only done batch updates using the distribution depository before. I'm just wondering why it has become so much slower since the latest update (I previously updated last December). Also wondering why the Galaxy process is now taking up 25% of the memory on a 64GB machine. Thanks Shaun Quoting Greg Von Kuster <greg@bx.psu.edu>:

...

Hi Shaun,

Security checks are performed on every active ( undeleted ) dataset within a data library as it's contents are rendered upon opening. For the current user, every dataset is checked to determine if the user can access it, and if so, then checks are made to see if the user has permission to perform operations on the dataset that fall into the add / modify / manage permissions areas. These checks incur db hits for each dataset, so if your data library include many datasets ( several hundred or more ), then it will take a bit of time to render it upon opening. The size of the dataset files is not an issue here, but the number of datasets within the data library.

A solution to this is to split up the data library. Change set 5200:ed7b6180b925 added the ability to move data library items within a library or between libraries, providing a way to split up a large library. This change set should make it to the distribution within the next few weeks, or you can pull it from our development repo.

Thanks Shaun,

Greg Von Kuster

On Mar 10, 2011, at 5:21 AM, SHAUN WEBB wrote:

...
Further to my email below.

I have a data library that contains many ~117Gb of NGS data, uploaded via file system path. This library was always slow to open (about 10s) but now takes several minutes or not at all.

Thanks for any help on this. I am still experiencing a memory leak that I can't pinpoint and it is only dissipated by restarting the server. At the moment my debugging level is set to INFO. Is there anything I can change in the universe file to try to trace this.

Thanks! Shaun

Greg Von Kuster Galaxy Development Team greg@bx.psu.edu

-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Greg Von Kuster

3:11 p.m.

Hi Shaun, On Mar 10, 2011, at 10:05 AM, SHAUN WEBB wrote:

...

Thanks Greg, there would be about 100 datasets in the library.

If there are only about 100 datasets, then this is likely not the cause of the time taken to render the library. Whatever is going on with the environment that is slowing Galaxy may also be causing this.

...

Can you tell me how I would pull a single changeset, I have only done batch updates using the distribution depository before.

You really can't pull a single change set, you just have to pull from the development repo in a batch update like you've done with teh stable distribution.

...

I'm just wondering why it has become so much slower since the latest update (I previously updated last December). Also wondering why the Galaxy process is now taking up 25% of the memory on a 64GB machine.

Yeah, the cause of this is really needed, although I have no idea what is could be...

...

Thanks Shaun

Quoting Greg Von Kuster <greg@bx.psu.edu>:

...
Hi Shaun,

Security checks are performed on every active ( undeleted ) dataset within a data library as it's contents are rendered upon opening. For the current user, every dataset is checked to determine if the user can access it, and if so, then checks are made to see if the user has permission to perform operations on the dataset that fall into the add / modify / manage permissions areas. These checks incur db hits for each dataset, so if your data library include many datasets ( several hundred or more ), then it will take a bit of time to render it upon opening. The size of the dataset files is not an issue here, but the number of datasets within the data library.

A solution to this is to split up the data library. Change set 5200:ed7b6180b925 added the ability to move data library items within a library or between libraries, providing a way to split up a large library. This change set should make it to the distribution within the next few weeks, or you can pull it from our development repo.

Thanks Shaun,

Greg Von Kuster

On Mar 10, 2011, at 5:21 AM, SHAUN WEBB wrote:

...
Further to my email below.

I have a data library that contains many ~117Gb of NGS data, uploaded via file system path. This library was always slow to open (about 10s) but now takes several minutes or not at all.

Thanks for any help on this. I am still experiencing a memory leak that I can't pinpoint and it is only dissipated by restarting the server. At the moment my debugging level is set to INFO. Is there anything I can change in the universe file to try to trace this.

Thanks! Shaun

Greg Von Kuster Galaxy Development Team greg@bx.psu.edu

-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Greg Von Kuster Galaxy Development Team greg@bx.psu.edu

Brad Chapman

3:07 p.m.

Greg; [large data library performance]

...

Security checks are performed on every active ( undeleted ) dataset within a data library as it's contents are rendered upon opening. For the current user, every dataset is checked to determine if the user can access it, and if so, then checks are made to see if the user has permission to perform operations on the dataset that fall into the add / modify / manage permissions areas. These checks incur db hits for each dataset, so if your data library include many datasets ( several hundred or more ), then it will take a bit of time to render it upon opening. The size of the dataset files is not an issue here, but the number of datasets within the data library.

We are running into this limit quite a bit in practice as our data libraries grow. Splitting it does provide as a quick workaround. What would you think about loading folder data on-demand via Ajax? Our data is stored in folders and sub-folders within the library, so this would let us scale up library items without having to arbitrarily split the data libraries. I took a quick look at the code with this in might and it looked, well, hard. But that could be totally due to my ignorance of the implementation, what do you think? Brad

Greg Von Kuster

3:24 p.m.

Hi Brad, I agree that implementing an ajax approach to rendering libraries may be a good solution, but also may be difficult. We'll analyze this a bit more and see if it will be reasonable. We made a decision when we implemented libraries that we would provide fine grained security at the dataset level. The trade-off, obviously, is that it takes time to check every dataset. Another approach to solve this would be to not provide as fine-grained security, and have security at the folder level rather than the dataset level. If this is done, all dataset within a folder would be required to have the same security. What are your thoughts on this approach? Thanks Brad, Greg On Mar 10, 2011, at 10:07 AM, Brad Chapman wrote:

...

Greg;

[large data library performance]

...
Security checks are performed on every active ( undeleted ) dataset within a data library as it's contents are rendered upon opening. For the current user, every dataset is checked to determine if the user can access it, and if so, then checks are made to see if the user has permission to perform operations on the dataset that fall into the add / modify / manage permissions areas. These checks incur db hits for each dataset, so if your data library include many datasets ( several hundred or more ), then it will take a bit of time to render it upon opening. The size of the dataset files is not an issue here, but the number of datasets within the data library.

We are running into this limit quite a bit in practice as our data libraries grow. Splitting it does provide as a quick workaround. What would you think about loading folder data on-demand via Ajax? Our data is stored in folders and sub-folders within the library, so this would let us scale up library items without having to arbitrarily split the data libraries.

I took a quick look at the code with this in might and it looked, well, hard. But that could be totally due to my ignorance of the implementation, what do you think?

Brad ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Greg Von Kuster Galaxy Development Team greg@bx.psu.edu

Brad Chapman

3:49 p.m.

Greg;

...

I agree that implementing an ajax approach to rendering libraries may be a good solution, but also may be difficult. We'll analyze this a bit more and see if it will be reasonable.

Great -- thanks for taking a look.

...

We made a decision when we implemented libraries that we would provide fine grained security at the dataset level. The trade-off, obviously, is that it takes time to check every dataset. Another approach to solve this would be to not provide as fine-grained security, and have security at the folder level rather than the dataset level. If this is done, all dataset within a folder would be required to have the same security. What are your thoughts on this approach?

This sounds like a very reasonable trade off. We go one further and use Data Library level security, so everything within a library has the same permissions, but I can definitely see how having the ability to control is at the folder level would be useful. Brad

Nate Coraor

15 Mar 15 Mar

12:43 p.m.

Brad Chapman wrote:

...

...
We made a decision when we implemented libraries that we would provide fine grained security at the dataset level. The trade-off, obviously, is that it takes time to check every dataset. Another approach to solve this would be to not provide as fine-grained security, and have security at the folder level rather than the dataset level. If this is done, all dataset within a folder would be required to have the same security. What are your thoughts on this approach?

This sounds like a very reasonable trade off. We go one further and use Data Library level security, so everything within a library has the same permissions, but I can definitely see how having the ability to control is at the folder level would be useful.

My suggestion would be a solution somewhere in the middle. The ability to have per-dataset permissions is something that I think we should retain, but we could change our current policy of checking the permissions of the entire library at every load. Instead, it could work like this: 1. Check permissions on the library. 2. Check permissions on the first level contents of the library. 3. When a folder is expanded to show its contents, check the permissions of that folder's contents via AJAX. The reason we didn't do this originally was to prevent folders from showing up if a user didn't have permission to access any of the datasets in that folder. But this can be worked around by setting access permission on the folder itself. This is probably a fair amount of work, though, since it means not loading subfolder contents at page load since we are not checking their security until later. --nate

...

Brad ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Brad Chapman

2:46 p.m.

Nate; [folder level security to speed up data library loading]

...

My suggestion would be a solution somewhere in the middle. The ability to have per-dataset permissions is something that I think we should retain, but we could change our current policy of checking the permissions of the entire library at every load. Instead, it could work like this:

1. Check permissions on the library. 2. Check permissions on the first level contents of the library. 3. When a folder is expanded to show its contents, check the permissions of that folder's contents via AJAX.

The reason we didn't do this originally was to prevent folders from showing up if a user didn't have permission to access any of the datasets in that folder. But this can be worked around by setting access permission on the folder itself.

This is probably a fair amount of work, though, since it means not loading subfolder contents at page load since we are not checking their security until later.

Agreed, the AJAX loading would be ideal and allow you to maintain full permissions. This would also allow scaling up to arbitrarily large data libraries, as long as they are nested within folders. It did look like a pretty big project upon digging into the code, but any modifications that allow larger libraries would definitely be appreciated. Thanks, Brad

Victor Ruotti

3:36 p.m.

New subject: Error when trying to import word document into galaxy

Hello, I was hoping you can point me on where to go to fix this problem. I realized that I needed to add a data type for word or try using a binary datatype instead. Maybe even format it into a pdf and use that datatype instead. Uploaded a word document into a data library and now I can't access the whole library due to an Unicode error. Any ideas how to delete word document to be able to see the data library again? Thanks in advance, Victor Error Traceback: View as: Interactive | Text | XML (full) ⇝ UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 5: ordinal not in range(128) Module weberror.evalexception.middleware:364 in respond view

...

...
app_iter = self.application(environ, detect_start_response) Module paste.debug.prints:98 in __call__ view environ, self.app) Module paste.wsgilib:539 in intercept_output view app_iter = application(environ, replacement_start_response) Module paste.recursive:80 in __call__ view return self.application(environ, start_response) Module paste.httpexceptions:632 in __call__ view return self.application(environ, start_response) Module galaxy.web.framework.base:145 in __call__ view body = method( trans, **kwargs ) Module galaxy.web.controllers.library_common:133 in browse_library view status=status ) Module galaxy.web.framework:645 in fill_template view return self.fill_template_mako( filename, **kwargs ) Module galaxy.web.framework:656 in fill_template_mako view return template.render( **data ) Module mako.template:133 in render view return runtime._render(self, self.callable_, args, data) Module mako.runtime:364 in _render view _render_context(template, callable_, context, *args, **_kwargs_for_callable(callable_, data)) Module mako.runtime:381 in _render_context view _exec_template(inherit, lclcontext, args=args, kwargs=kwargs) Module mako.runtime:414 in _exec_template view callable_(context, *args, **kwargs) Module _base_mako:40 in render_body view __M_writer(unicode(next.body())) Module mako.runtime:255 in <lambda> view return lambda *args, **kwargs:callable_(self.context, *args, **kwargs) Module _library_common_browse_library_mako:101 in render_body view __M_writer(unicode(render_content())) Module _library_common_browse_library_mako:96 in render_content view return render_render_content(context) Module _library_common_browse_library_mako:282 in render_render_content view __M_writer(unicode(self.render_folder( 'library', library.root_folder, 0, created_ldda_ids, library, hidden_folder_ids, tracked_datasets, show_deleted=show_deleted, parent=None, row_counter=row_counter, root_folder=True ))) Module mako.runtime:255 in <lambda> view return lambda *args, **kwargs:callable_(self.context, *args, **kwargs) Module _library_common_browse_library_mako:872 in render_render_folder view __M_writer(unicode(render_dataset( cntrller, ldda, library_dataset, selected, library, folder, pad, my_row, row_counter, tracked_datasets, show_deleted=show_deleted ))) Module _library_common_browse_library_mako:660 in render_dataset view return render_render_dataset(context,cntrller,ldda,library_dataset,selected,library,folder,pad,parent,row_counter,tracked_datasets,show_deleted) Module _library_common_browse_library_mako:536 in render_render_dataset view __M_writer(unicode(ldda.name)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 5: ordinal not in range(128)

Extra Features

...

...
Display the lines of code near each part of the traceback Show a debug prompt to allow you to directly debug the code at the traceback

Nate Coraor

12:50 p.m.

SHAUN WEBB wrote:

...

Further to my email below.

I have a data library that contains many ~117Gb of NGS data, uploaded via file system path. This library was always slow to open (about 10s) but now takes several minutes or not at all.

Thanks for any help on this. I am still experiencing a memory leak that I can't pinpoint and it is only dissipated by restarting the server. At the moment my debugging level is set to INFO. Is there anything I can change in the universe file to try to trace this.

Two things that would help here: Are there other users doing things in Galaxy at this time? It'd help to be able to determine exactly what is triggering this. Also, if you set 'use_heartbeat = True' in universe_wsgi.ini, this will dump the call stack every 30 seconds to the file 'heartbeat.log' (and 'heartbeat.log.nonsleeping'). This should reveal where the thread(s) are hung. --nate

...

Thanks! Shaun

----- Forwarded message from swebb1@staffmail.ed.ac.uk ----- Date: Wed, 09 Mar 2011 10:15:13 +0000 From: SHAUN WEBB <swebb1@staffmail.ed.ac.uk> Subject: Galaxy process To: galaxy dev <galaxy-dev@bx.psu.edu>

Hi,

since making the last update I have found some new warnings in my paster.log, it also seems as though the galaxy process starts to gather memory and eventually hang (35% of 64G memory).

I've posted the entries below.

If anyone could help me understand what is going on that would be great.

Thanks. Shaun Webb

paste.httpserver.ThreadPool INFO 2011-03-09 09:49:49,962 No idle tasks, and only 0 busy tasks; adding 5 more workers paste.httpserver.ThreadPool INFO 2011-03-09 09:49:58,754 No idle tasks, and only 4 busy tasks; adding 1 more workers paste.httpserver.ThreadPool INFO 2011-03-09 09:51:47,301 Culling 6 extra workers (5 idle workers present) paste.httpserver.ThreadPool INFO 2011-03-09 09:55:17,163 No idle tasks, and only 0 busy tasks; adding 5 more workers 129.215.14.72 - - [09/Mar/2011:09:48:40 +0100] "GET /history HTTP/1.1" 500 - "http://bifx-core.bio.ed.ac.uk:8080/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)" paste.httpserver.ThreadPool INFO 2011-03-09 10:13:16,956 Culling 5 extra workers (7 idle workers present) 212.183.140.59 - - [09/Mar/2011:10:13:17 +0100] "GET / HTTP/1.1" 200 - "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.102 Safari/534.13" paste.httpserver.ThreadPool INFO 2011-03-09 10:14:35,715 No idle tasks, and only 2 busy tasks; adding 3 more workers paste.httpserver.ThreadPool WARNING 2011-03-09 10:15:15,104 Thread 140283224094464 hung (working on task for 3096 seconds) ---------------------------------------- Exception happened during processing of request from ('212.183.140.59', 10871) Traceback (most recent call last): File "/usr/lib/python2.6/SocketServer.py", line 281, in _handle_request_noblock self.process_request(request, client_address) File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1037, in process_request lambda: self.process_request_in_thread(request, client_address)) File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 617, in add_task self.kill_hung_threads() File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 778, in kill_hung_threads self.kill_worker(worker.thread_id) File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 705, in kill_worker killthread.async_raise(thread_id, SystemExit) File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/util/killthread.py", line 22, in async_raise raise ValueError("invalid thread id") ValueError: invalid thread id ----------------------------------------

-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

----- End forwarded message -----

-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

5332

Age (days ago)

5337

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Brad Chapman
Greg Von Kuster
Nate Coraor
SHAUN WEBB
Victor Ruotti

Fwd: Galaxy process

tags

participants (5)