Hi all,
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
Thanks, Richard
Richard J Poole PhD Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133 (int. 46133)
Hi Richard,
I am relatively new to galaxy so if you get a different response from one of the core team ignore this.
One thing I would check is the underlying database. What do you have set for "database_connection" in your galaxy.ini file.
Especially if you are using the default sqlite this could be the issue. As that is store in a single file on disk.
Whichever database you have make sure it has enough resources to handle what will now be a large size.
Christian ________________________________ From: galaxy-dev [galaxy-dev-bounces@lists.galaxyproject.org] on behalf of Poole, Richard [r.poole@ucl.ac.uk] Sent: Wednesday, July 08, 2015 9:04 PM To: galaxy-dev@lists.galaxyproject.org Subject: [galaxy-dev] Slow repsonses viewing histories
Hi all,
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
Thanks, Richard
Richard J Poole PhD Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133 (int. 46133)
Hi, Richard
How many histories are on your account? How many datasets (roughly)?
Are you using an Admin account to view the histories and does the slow down still occur for regular users with large amounts of data?
One of the exposed attributes of datasets (for admins - not other users generally) is the file_name. I've noticed that retrieving this attribute from the file system can be slow.
Christian also provides good advice.
On Thu, Jul 9, 2015 at 4:12 AM, Christian Brenninkmeijer < christian.brenninkmeijer@manchester.ac.uk> wrote:
Hi Richard,
I am relatively new to galaxy so if you get a different response from one of the core team ignore this.
One thing I would check is the underlying database. What do you have set for "database_connection" in your galaxy.ini file.
Especially if you are using the default sqlite this could be the issue. As that is store in a single file on disk.
Whichever database you have make sure it has enough resources to handle what will now be a large size.
Christian
*From:* galaxy-dev [galaxy-dev-bounces@lists.galaxyproject.org] on behalf of Poole, Richard [r.poole@ucl.ac.uk] *Sent:* Wednesday, July 08, 2015 9:04 PM *To:* galaxy-dev@lists.galaxyproject.org *Subject:* [galaxy-dev] Slow repsonses viewing histories
Hi all,
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
Thanks, Richard
*Richard J Poole PhD* Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133 (int. 46133)
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Christian and Carl,
Thanks both for the replies.
To answer your questions in reverse order. I have about XX histories in my account each with an average of about XX datasets. Total data in my account is about 1TB.
It is indeed an admin account and other users with close to 1TB of data do not have a similar slow down. Although their data is spread over far fewer histories. Is there a way then to prevent the file_name attribute being requested for admin accounts so I can see if this speeds things back up again?
Although the Galaxy server is running on my iMac the data is stored external on a large directly attached NAS. I think I first noticed this slow down after deleting and purging a bunch of older histories to free space on the NAS. I have tried running some of the cleanup_datasets scripts but they are actually returning errors and not running right now (can give you the error messages if necessary).
The slowdown is actually getting worse now and it is even slow to display tool pages, as well as often getting this error if it is really slow: Proxy Error
The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /history/listhttp://iworm.anat.ucl.ac.uk:8080/history/list.
Reason: Error reading from remote server
I am running through an apache proxy - perhaps the apache settings need tweaking too? (I forget right now where I set these up!).
As for the database itself, I am running PostgreSQL 9.3 and I tweaked the settings in my universe_wsgi.ini as per the instructions on https://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer#Adv...
So my settings are:
# -- Database
# By default, Galaxy uses a SQLite database at 'database/universe.sqlite'. You # may use a SQLAlchemy connection string to specify an external database # instead. This string takes many options which are explained in detail in the # config file documentation. database_connection = postgresql://*******:*******@localhost:5432/galaxy_prod
# If the server logs errors about not having enough database pool connections, # you will want to increase these values, or consider running more Galaxy # processes. database_engine_option_pool_size = 10 database_engine_option_max_overflow = 20
# If using MySQL and the server logs the error "MySQL server has gone away", # you will want to set this to some positive value (7200 should work). #database_engine_option_pool_recycle = -1
# If large database query results are causing memory or response time issues in # the Galaxy process, leave the result on the server instead. This option is # only available for PostgreSQL and is highly recommended. database_engine_option_server_side_cursors = True
# Create only one connection to the database per thread, to reduce the # connection overhead. Recommended when not using SQLite: database_engine_option_strategy = threadlocal
# Log all database transactions, can be useful for debugging and performance # profiling. Logging is done via Python's 'logging' module under the qualname # 'galaxy.model.orm.logging_connection_proxy' database_query_profiling_proxy = False
# -- Files and directories
Let me know if you think these settings are appropriate or need further tweaks.
Thanks again for your responses so far,
Richard
On 13 Jul 2015, at 16:31, Carl Eberhard <carlfeberhard@gmail.commailto:carlfeberhard@gmail.com> wrote:
Hi, Richard
How many histories are on your account? How many datasets (roughly)?
Are you using an Admin account to view the histories and does the slow down still occur for regular users with large amounts of data?
One of the exposed attributes of datasets (for admins - not other users generally) is the file_name. I've noticed that retrieving this attribute from the file system can be slow.
Christian also provides good advice.
On Thu, Jul 9, 2015 at 4:12 AM, Christian Brenninkmeijer <christian.brenninkmeijer@manchester.ac.ukmailto:christian.brenninkmeijer@manchester.ac.uk> wrote: Hi Richard,
I am relatively new to galaxy so if you get a different response from one of the core team ignore this.
One thing I would check is the underlying database. What do you have set for "database_connection" in your galaxy.ini file.
Especially if you are using the default sqlite this could be the issue. As that is store in a single file on disk.
Whichever database you have make sure it has enough resources to handle what will now be a large size.
Christian ________________________________ From: galaxy-dev [galaxy-dev-bounces@lists.galaxyproject.orgmailto:galaxy-dev-bounces@lists.galaxyproject.org] on behalf of Poole, Richard [r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk] Sent: Wednesday, July 08, 2015 9:04 PM To: galaxy-dev@lists.galaxyproject.orgmailto:galaxy-dev@lists.galaxyproject.org Subject: [galaxy-dev] Slow repsonses viewing histories
Hi all,
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
Thanks, Richard
Richard J Poole PhD Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577tel:%2B44%C2%A020%207679%206577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133tel:%2B44%2020%C2%A07679%206133 (int. 46133)
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Richard,
By any chance, are you running Galaxy 15.05 or later? 15.05 includes new metadata for bam files that can cause UI performance problems with certain types of bam files. This can be limited with the new `max_metadata_value_size` in galaxy.ini (on usegalaxy.org we've set it to 1000000).
I've also created a pull request to make this limiting the default: https://github.com/galaxyproject/galaxy/pull/466
However, if you are using an older version of Galaxy, this issue is not related to the problem you're experiencing.
--nate
On Wed, Jul 15, 2015 at 2:16 PM, Poole, Richard r.poole@ucl.ac.uk wrote:
Hi Christian and Carl,
Thanks both for the replies.
To answer your questions in reverse order. I have about XX histories in my account each with an average of about XX datasets. Total data in my account is about 1TB.
It is indeed an admin account and other users with close to 1TB of data do not have a similar slow down. Although their data is spread over far fewer histories. Is there a way then to prevent the file_name attribute being requested for admin accounts so I can see if this speeds things back up again?
Although the Galaxy server is running on my iMac the data is stored external on a large directly attached NAS. I think I first noticed this slow down after deleting and purging a bunch of older histories to free space on the NAS. I have tried running some of the cleanup_datasets scripts but they are actually returning errors and not running right now (can give you the error messages if necessary).
The slowdown is actually getting worse now and it is even slow to display tool pages, as well as often getting this error if it is really slow: Proxy Error
The proxy server received an invalid response from an upstream server. The proxy server could not handle the request *GET /history/list http://iworm.anat.ucl.ac.uk:8080/history/list*.
Reason: *Error reading from remote server* I am running through an apache proxy - perhaps the apache settings need tweaking too? (I forget right now where I set these up!).
As for the database itself, I am running PostgreSQL 9.3 and I tweaked the settings in my universe_wsgi.ini as per the instructions on https://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer#Adv...
So my settings are:
# -- Database
# By default, Galaxy uses a SQLite database at 'database/universe.sqlite'. You # may use a SQLAlchemy connection string to specify an external database # instead. This string takes many options which are explained in detail in the # config file documentation. database_connection = postgresql://*******:*******@localhost:5432/galaxy_prod
# If the server logs errors about not having enough database pool connections, # you will want to increase these values, or consider running more Galaxy # processes. database_engine_option_pool_size = 10 database_engine_option_max_overflow = 20
# If using MySQL and the server logs the error "MySQL server has gone away", # you will want to set this to some positive value (7200 should work). #database_engine_option_pool_recycle = -1
# If large database query results are causing memory or response time issues in # the Galaxy process, leave the result on the server instead. This option is # only available for PostgreSQL and is highly recommended. database_engine_option_server_side_cursors = True
# Create only one connection to the database per thread, to reduce the # connection overhead. Recommended when not using SQLite: database_engine_option_strategy = threadlocal
# Log all database transactions, can be useful for debugging and performance # profiling. Logging is done via Python's 'logging' module under the qualname # 'galaxy.model.orm.logging_connection_proxy' database_query_profiling_proxy = False
# -- Files and directories
Let me know if you think these settings are appropriate or need further tweaks.
Thanks again for your responses so far,
Richard
On 13 Jul 2015, at 16:31, Carl Eberhard carlfeberhard@gmail.com wrote:
Hi, Richard
How many histories are on your account? How many datasets (roughly)?
Are you using an Admin account to view the histories and does the slow down still occur for regular users with large amounts of data?
One of the exposed attributes of datasets (for admins - not other users generally) is the file_name. I've noticed that retrieving this attribute from the file system can be slow.
Christian also provides good advice.
On Thu, Jul 9, 2015 at 4:12 AM, Christian Brenninkmeijer < christian.brenninkmeijer@manchester.ac.uk> wrote:
Hi Richard,
I am relatively new to galaxy so if you get a different response from one of the core team ignore this.
One thing I would check is the underlying database. What do you have set for "database_connection" in your galaxy.ini file.
Especially if you are using the default sqlite this could be the issue. As that is store in a single file on disk.
Whichever database you have make sure it has enough resources to handle what will now be a large size.
Christian
*From:* galaxy-dev [galaxy-dev-bounces@lists.galaxyproject.org] on behalf of Poole, Richard [r.poole@ucl.ac.uk] *Sent:* Wednesday, July 08, 2015 9:04 PM *To:* galaxy-dev@lists.galaxyproject.org *Subject:* [galaxy-dev] Slow repsonses viewing histories
Hi all,
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
Thanks, Richard
*Richard J Poole PhD* Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133 (int. 46133)
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Nate,
I am indeed using later than 15.05………so I will try this fix next time I can restart the server and let you know.
Richard
On 15 Jul 2015, at 19:32, Nate Coraor <nate@bx.psu.edumailto:nate@bx.psu.edu> wrote:
Hi Richard,
By any chance, are you running Galaxy 15.05 or later? 15.05 includes new metadata for bam files that can cause UI performance problems with certain types of bam files. This can be limited with the new `max_metadata_value_size` in galaxy.ini (on usegalaxy.orghttp://usegalaxy.org/ we've set it to 1000000).
I've also created a pull request to make this limiting the default: https://github.com/galaxyproject/galaxy/pull/466
However, if you are using an older version of Galaxy, this issue is not related to the problem you're experiencing.
--nate
On Wed, Jul 15, 2015 at 2:16 PM, Poole, Richard <r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk> wrote: Hi Christian and Carl,
Thanks both for the replies.
To answer your questions in reverse order. I have about XX histories in my account each with an average of about XX datasets. Total data in my account is about 1TB.
It is indeed an admin account and other users with close to 1TB of data do not have a similar slow down. Although their data is spread over far fewer histories. Is there a way then to prevent the file_name attribute being requested for admin accounts so I can see if this speeds things back up again?
Although the Galaxy server is running on my iMac the data is stored external on a large directly attached NAS. I think I first noticed this slow down after deleting and purging a bunch of older histories to free space on the NAS. I have tried running some of the cleanup_datasets scripts but they are actually returning errors and not running right now (can give you the error messages if necessary).
The slowdown is actually getting worse now and it is even slow to display tool pages, as well as often getting this error if it is really slow: Proxy Error
The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /history/listhttp://iworm.anat.ucl.ac.uk:8080/history/list.
Reason: Error reading from remote server
I am running through an apache proxy - perhaps the apache settings need tweaking too? (I forget right now where I set these up!).
As for the database itself, I am running PostgreSQL 9.3 and I tweaked the settings in my universe_wsgi.ini as per the instructions on https://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer#Adv...
So my settings are:
# -- Database
# By default, Galaxy uses a SQLite database at 'database/universe.sqlite'. You # may use a SQLAlchemy connection string to specify an external database # instead. This string takes many options which are explained in detail in the # config file documentation. database_connection = postgresql://*******:*******@localhost:5432/galaxy_prod
# If the server logs errors about not having enough database pool connections, # you will want to increase these values, or consider running more Galaxy # processes. database_engine_option_pool_size = 10 database_engine_option_max_overflow = 20
# If using MySQL and the server logs the error "MySQL server has gone away", # you will want to set this to some positive value (7200 should work). #database_engine_option_pool_recycle = -1
# If large database query results are causing memory or response time issues in # the Galaxy process, leave the result on the server instead. This option is # only available for PostgreSQL and is highly recommended. database_engine_option_server_side_cursors = True
# Create only one connection to the database per thread, to reduce the # connection overhead. Recommended when not using SQLite: database_engine_option_strategy = threadlocal
# Log all database transactions, can be useful for debugging and performance # profiling. Logging is done via Python's 'logging' module under the qualname # 'galaxy.model.orm.logging_connection_proxy' database_query_profiling_proxy = False
# -- Files and directories
Let me know if you think these settings are appropriate or need further tweaks.
Thanks again for your responses so far,
Richard
On 13 Jul 2015, at 16:31, Carl Eberhard <carlfeberhard@gmail.commailto:carlfeberhard@gmail.com> wrote:
Hi, Richard
How many histories are on your account? How many datasets (roughly)?
Are you using an Admin account to view the histories and does the slow down still occur for regular users with large amounts of data?
One of the exposed attributes of datasets (for admins - not other users generally) is the file_name. I've noticed that retrieving this attribute from the file system can be slow.
Christian also provides good advice.
On Thu, Jul 9, 2015 at 4:12 AM, Christian Brenninkmeijer <christian.brenninkmeijer@manchester.ac.ukmailto:christian.brenninkmeijer@manchester.ac.uk> wrote: Hi Richard,
I am relatively new to galaxy so if you get a different response from one of the core team ignore this.
One thing I would check is the underlying database. What do you have set for "database_connection" in your galaxy.ini file.
Especially if you are using the default sqlite this could be the issue. As that is store in a single file on disk.
Whichever database you have make sure it has enough resources to handle what will now be a large size.
Christian ________________________________ From: galaxy-dev [galaxy-dev-bounces@lists.galaxyproject.orgmailto:galaxy-dev-bounces@lists.galaxyproject.org] on behalf of Poole, Richard [r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk] Sent: Wednesday, July 08, 2015 9:04 PM To: galaxy-dev@lists.galaxyproject.orgmailto:galaxy-dev@lists.galaxyproject.org Subject: [galaxy-dev] Slow repsonses viewing histories
Hi all,
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
Thanks, Richard
Richard J Poole PhD Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577tel:%2B44%C2%A020%207679%206577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133tel:%2B44%2020%C2%A07679%206133 (int. 46133)
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Richard,
Unfortunately, you will need to reset metadata for any problematic datasets once you have updated to the latest version of 15.05 and set a cutoff value. You can find the datasets with the following SQL query in your database:
select hda.id, u.email, h.name, hda.hid, hda.name, length(hda.metadata) from history_dataset_association hda join history h on hda.history_id=h.id join galaxy_user u on h.user_id=u.id where length(hda.metadata) > 1048576 order by length(hda.metadata) desc;
And you can reset metadata by clicking on the dataset's pencil icon in your history and clicking "auto-detect".
--nate
On Wed, Jul 15, 2015 at 3:16 PM, Poole, Richard r.poole@ucl.ac.uk wrote:
Hi Nate,
I am indeed using later than 15.05………so I will try this fix next time I can restart the server and let you know.
Richard
On 15 Jul 2015, at 19:32, Nate Coraor nate@bx.psu.edu wrote:
Hi Richard,
By any chance, are you running Galaxy 15.05 or later? 15.05 includes new metadata for bam files that can cause UI performance problems with certain types of bam files. This can be limited with the new `max_metadata_value_size` in galaxy.ini (on usegalaxy.org we've set it to 1000000).
I've also created a pull request to make this limiting the default: https://github.com/galaxyproject/galaxy/pull/466
However, if you are using an older version of Galaxy, this issue is not related to the problem you're experiencing.
--nate
On Wed, Jul 15, 2015 at 2:16 PM, Poole, Richard r.poole@ucl.ac.uk wrote:
Hi Christian and Carl,
Thanks both for the replies.
To answer your questions in reverse order. I have about XX histories in my account each with an average of about XX datasets. Total data in my account is about 1TB.
It is indeed an admin account and other users with close to 1TB of data do not have a similar slow down. Although their data is spread over far fewer histories. Is there a way then to prevent the file_name attribute being requested for admin accounts so I can see if this speeds things back up again?
Although the Galaxy server is running on my iMac the data is stored external on a large directly attached NAS. I think I first noticed this slow down after deleting and purging a bunch of older histories to free space on the NAS. I have tried running some of the cleanup_datasets scripts but they are actually returning errors and not running right now (can give you the error messages if necessary).
The slowdown is actually getting worse now and it is even slow to display tool pages, as well as often getting this error if it is really slow: Proxy Error
The proxy server received an invalid response from an upstream server. The proxy server could not handle the request *GET /history/list http://iworm.anat.ucl.ac.uk:8080/history/list*.
Reason: *Error reading from remote server* I am running through an apache proxy - perhaps the apache settings need tweaking too? (I forget right now where I set these up!).
As for the database itself, I am running PostgreSQL 9.3 and I tweaked the settings in my universe_wsgi.ini as per the instructions on https://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer#Adv...
So my settings are:
# -- Database
# By default, Galaxy uses a SQLite database at 'database/universe.sqlite'. You # may use a SQLAlchemy connection string to specify an external database # instead. This string takes many options which are explained in detail in the # config file documentation. database_connection = postgresql://*******:*******@localhost:5432/galaxy_prod
# If the server logs errors about not having enough database pool connections, # you will want to increase these values, or consider running more Galaxy # processes. database_engine_option_pool_size = 10 database_engine_option_max_overflow = 20
# If using MySQL and the server logs the error "MySQL server has gone away", # you will want to set this to some positive value (7200 should work). #database_engine_option_pool_recycle = -1
# If large database query results are causing memory or response time issues in # the Galaxy process, leave the result on the server instead. This option is # only available for PostgreSQL and is highly recommended. database_engine_option_server_side_cursors = True
# Create only one connection to the database per thread, to reduce the # connection overhead. Recommended when not using SQLite: database_engine_option_strategy = threadlocal
# Log all database transactions, can be useful for debugging and performance # profiling. Logging is done via Python's 'logging' module under the qualname # 'galaxy.model.orm.logging_connection_proxy' database_query_profiling_proxy = False
# -- Files and directories
Let me know if you think these settings are appropriate or need further tweaks.
Thanks again for your responses so far,
Richard
On 13 Jul 2015, at 16:31, Carl Eberhard carlfeberhard@gmail.com wrote:
Hi, Richard
How many histories are on your account? How many datasets (roughly)?
Are you using an Admin account to view the histories and does the slow down still occur for regular users with large amounts of data?
One of the exposed attributes of datasets (for admins - not other users generally) is the file_name. I've noticed that retrieving this attribute from the file system can be slow.
Christian also provides good advice.
On Thu, Jul 9, 2015 at 4:12 AM, Christian Brenninkmeijer < christian.brenninkmeijer@manchester.ac.uk> wrote:
Hi Richard,
I am relatively new to galaxy so if you get a different response from one of the core team ignore this.
One thing I would check is the underlying database. What do you have set for "database_connection" in your galaxy.ini file.
Especially if you are using the default sqlite this could be the issue. As that is store in a single file on disk.
Whichever database you have make sure it has enough resources to handle what will now be a large size.
Christian
*From:* galaxy-dev [galaxy-dev-bounces@lists.galaxyproject.org] on behalf of Poole, Richard [r.poole@ucl.ac.uk] *Sent:* Wednesday, July 08, 2015 9:04 PM *To:* galaxy-dev@lists.galaxyproject.org *Subject:* [galaxy-dev] Slow repsonses viewing histories
Hi all,
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
Thanks, Richard
*Richard J Poole PhD* Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133 (int. 46133)
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Nate,
Ok - I can try this but I’m not an SQL database expert - far from it in fact. That query will just list ‘problematic’ datasets? I have a lot of histories with BAM files ;)
Resetting the metadata needs to be done before this fix works? I ask because my history is so slow now as to be unusable…….or setting the cutoff allows Galaxy to be usable but it wipes out the metadata (which then needs resetting).
I am still using universe_wsgi.ini not galaxy.ini and I don’t see the 'max_metadata_value_size’ in any section of my universe_wsgi_ini file - I guess because I’m not on latest update. I am actually not sure what version I am on as to fix an earlier issue with shared datasets (that wasn’t yet on mercurial) on the advice of a few folks I generated a branch specific for the patch using:
hg branch fix_history_sharing
curl https://github.com/galaxyproject/galaxy/commit/62772bc86e2504982f207a982542c... -p1
Also not being a huge mecurial expert I am a little unsure now how to switch back to stable branch correctly and pull the latest updates. Could you advise (sorry to be a noob)?
Rich
On 15 Jul 2015, at 20:22, Nate Coraor <nate@bx.psu.edumailto:nate@bx.psu.edu> wrote:
Hi Richard,
Unfortunately, you will need to reset metadata for any problematic datasets once you have updated to the latest version of 15.05 and set a cutoff value. You can find the datasets with the following SQL query in your database:
select hda.idhttp://hda.id/, u.email, h.namehttp://h.name/, hda.hid, hda.namehttp://hda.name/, length(hda.metadata) from history_dataset_association hda join history h on hda.history_id=h.idhttp://h.id/ join galaxy_user u on h.user_id=u.idhttp://u.id/ where length(hda.metadata) > 1048576 order by length(hda.metadata) desc;
And you can reset metadata by clicking on the dataset's pencil icon in your history and clicking "auto-detect".
--nate
On Wed, Jul 15, 2015 at 3:16 PM, Poole, Richard <r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk> wrote: Hi Nate,
I am indeed using later than 15.05………so I will try this fix next time I can restart the server and let you know.
Richard
On 15 Jul 2015, at 19:32, Nate Coraor <nate@bx.psu.edumailto:nate@bx.psu.edu> wrote:
Hi Richard,
By any chance, are you running Galaxy 15.05 or later? 15.05 includes new metadata for bam files that can cause UI performance problems with certain types of bam files. This can be limited with the new `max_metadata_value_size` in galaxy.ini (on usegalaxy.orghttp://usegalaxy.org/ we've set it to 1000000).
I've also created a pull request to make this limiting the default: https://github.com/galaxyproject/galaxy/pull/466
However, if you are using an older version of Galaxy, this issue is not related to the problem you're experiencing.
--nate
On Wed, Jul 15, 2015 at 2:16 PM, Poole, Richard <r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk> wrote: Hi Christian and Carl,
Thanks both for the replies.
To answer your questions in reverse order. I have about XX histories in my account each with an average of about XX datasets. Total data in my account is about 1TB.
It is indeed an admin account and other users with close to 1TB of data do not have a similar slow down. Although their data is spread over far fewer histories. Is there a way then to prevent the file_name attribute being requested for admin accounts so I can see if this speeds things back up again?
Although the Galaxy server is running on my iMac the data is stored external on a large directly attached NAS. I think I first noticed this slow down after deleting and purging a bunch of older histories to free space on the NAS. I have tried running some of the cleanup_datasets scripts but they are actually returning errors and not running right now (can give you the error messages if necessary).
The slowdown is actually getting worse now and it is even slow to display tool pages, as well as often getting this error if it is really slow: Proxy Error
The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /history/listhttp://iworm.anat.ucl.ac.uk:8080/history/list.
Reason: Error reading from remote server
I am running through an apache proxy - perhaps the apache settings need tweaking too? (I forget right now where I set these up!).
As for the database itself, I am running PostgreSQL 9.3 and I tweaked the settings in my universe_wsgi.ini as per the instructions on https://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer#Adv...
So my settings are:
# -- Database
# By default, Galaxy uses a SQLite database at 'database/universe.sqlite'. You # may use a SQLAlchemy connection string to specify an external database # instead. This string takes many options which are explained in detail in the # config file documentation. database_connection = postgresql://*******:*******@localhost:5432/galaxy_prod
# If the server logs errors about not having enough database pool connections, # you will want to increase these values, or consider running more Galaxy # processes. database_engine_option_pool_size = 10 database_engine_option_max_overflow = 20
# If using MySQL and the server logs the error "MySQL server has gone away", # you will want to set this to some positive value (7200 should work). #database_engine_option_pool_recycle = -1
# If large database query results are causing memory or response time issues in # the Galaxy process, leave the result on the server instead. This option is # only available for PostgreSQL and is highly recommended. database_engine_option_server_side_cursors = True
# Create only one connection to the database per thread, to reduce the # connection overhead. Recommended when not using SQLite: database_engine_option_strategy = threadlocal
# Log all database transactions, can be useful for debugging and performance # profiling. Logging is done via Python's 'logging' module under the qualname # 'galaxy.model.orm.logging_connection_proxy' database_query_profiling_proxy = False
# -- Files and directories
Let me know if you think these settings are appropriate or need further tweaks.
Thanks again for your responses so far,
Richard
On 13 Jul 2015, at 16:31, Carl Eberhard <carlfeberhard@gmail.commailto:carlfeberhard@gmail.com> wrote:
Hi, Richard
How many histories are on your account? How many datasets (roughly)?
Are you using an Admin account to view the histories and does the slow down still occur for regular users with large amounts of data?
One of the exposed attributes of datasets (for admins - not other users generally) is the file_name. I've noticed that retrieving this attribute from the file system can be slow.
Christian also provides good advice.
On Thu, Jul 9, 2015 at 4:12 AM, Christian Brenninkmeijer <christian.brenninkmeijer@manchester.ac.ukmailto:christian.brenninkmeijer@manchester.ac.uk> wrote: Hi Richard,
I am relatively new to galaxy so if you get a different response from one of the core team ignore this.
One thing I would check is the underlying database. What do you have set for "database_connection" in your galaxy.ini file.
Especially if you are using the default sqlite this could be the issue. As that is store in a single file on disk.
Whichever database you have make sure it has enough resources to handle what will now be a large size.
Christian ________________________________ From: galaxy-dev [galaxy-dev-bounces@lists.galaxyproject.orgmailto:galaxy-dev-bounces@lists.galaxyproject.org] on behalf of Poole, Richard [r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk] Sent: Wednesday, July 08, 2015 9:04 PM To: galaxy-dev@lists.galaxyproject.orgmailto:galaxy-dev@lists.galaxyproject.org Subject: [galaxy-dev] Slow repsonses viewing histories
Hi all,
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
Thanks, Richard
Richard J Poole PhD Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577tel:%2B44%C2%A020%207679%206577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133tel:%2B44%2020%C2%A07679%206133 (int. 46133)
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Just a little update - executed the SQL query and luckily only returned 20 BAM files - the most recently generated ones of course. Now makes sense this could well be the cause of the UI problem as my account became more and more unresponsive the more of these 20 BAM files I generated over the last week………..
Rich
On 15 Jul 2015, at 20:38, Richard Poole <ucgarjp@live.ucl.ac.ukmailto:ucgarjp@live.ucl.ac.uk> wrote:
Hi Nate,
Ok - I can try this but I’m not an SQL database expert - far from it in fact. That query will just list ‘problematic’ datasets? I have a lot of histories with BAM files ;)
Resetting the metadata needs to be done before this fix works? I ask because my history is so slow now as to be unusable…….or setting the cutoff allows Galaxy to be usable but it wipes out the metadata (which then needs resetting).
I am still using universe_wsgi.ini not galaxy.ini and I don’t see the 'max_metadata_value_size’ in any section of my universe_wsgi_ini file - I guess because I’m not on latest update. I am actually not sure what version I am on as to fix an earlier issue with shared datasets (that wasn’t yet on mercurial) on the advice of a few folks I generated a branch specific for the patch using:
hg branch fix_history_sharing
curl https://github.com/galaxyproject/galaxy/commit/62772bc86e2504982f207a982542c... -p1
Also not being a huge mecurial expert I am a little unsure now how to switch back to stable branch correctly and pull the latest updates. Could you advise (sorry to be a noob)?
Rich
On 15 Jul 2015, at 20:22, Nate Coraor <nate@bx.psu.edumailto:nate@bx.psu.edu> wrote:
Hi Richard,
Unfortunately, you will need to reset metadata for any problematic datasets once you have updated to the latest version of 15.05 and set a cutoff value. You can find the datasets with the following SQL query in your database:
select hda.idhttp://hda.id/, u.email, h.namehttp://h.name/, hda.hid, hda.namehttp://hda.name/, length(hda.metadata) from history_dataset_association hda join history h on hda.history_id=h.idhttp://h.id/ join galaxy_user u on h.user_id=u.idhttp://u.id/ where length(hda.metadata) > 1048576 order by length(hda.metadata) desc;
And you can reset metadata by clicking on the dataset's pencil icon in your history and clicking "auto-detect".
--nate
On Wed, Jul 15, 2015 at 3:16 PM, Poole, Richard <r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk> wrote: Hi Nate,
I am indeed using later than 15.05………so I will try this fix next time I can restart the server and let you know.
Richard
On 15 Jul 2015, at 19:32, Nate Coraor <nate@bx.psu.edumailto:nate@bx.psu.edu> wrote:
Hi Richard,
By any chance, are you running Galaxy 15.05 or later? 15.05 includes new metadata for bam files that can cause UI performance problems with certain types of bam files. This can be limited with the new `max_metadata_value_size` in galaxy.ini (on usegalaxy.orghttp://usegalaxy.org/ we've set it to 1000000).
I've also created a pull request to make this limiting the default: https://github.com/galaxyproject/galaxy/pull/466
However, if you are using an older version of Galaxy, this issue is not related to the problem you're experiencing.
--nate
On Wed, Jul 15, 2015 at 2:16 PM, Poole, Richard <r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk> wrote: Hi Christian and Carl,
Thanks both for the replies.
To answer your questions in reverse order. I have about XX histories in my account each with an average of about XX datasets. Total data in my account is about 1TB.
It is indeed an admin account and other users with close to 1TB of data do not have a similar slow down. Although their data is spread over far fewer histories. Is there a way then to prevent the file_name attribute being requested for admin accounts so I can see if this speeds things back up again?
Although the Galaxy server is running on my iMac the data is stored external on a large directly attached NAS. I think I first noticed this slow down after deleting and purging a bunch of older histories to free space on the NAS. I have tried running some of the cleanup_datasets scripts but they are actually returning errors and not running right now (can give you the error messages if necessary).
The slowdown is actually getting worse now and it is even slow to display tool pages, as well as often getting this error if it is really slow: Proxy Error
The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /history/listhttp://iworm.anat.ucl.ac.uk:8080/history/list.
Reason: Error reading from remote server
I am running through an apache proxy - perhaps the apache settings need tweaking too? (I forget right now where I set these up!).
As for the database itself, I am running PostgreSQL 9.3 and I tweaked the settings in my universe_wsgi.ini as per the instructions on https://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer#Adv...
So my settings are:
# -- Database
# By default, Galaxy uses a SQLite database at 'database/universe.sqlite'. You # may use a SQLAlchemy connection string to specify an external database # instead. This string takes many options which are explained in detail in the # config file documentation. database_connection = postgresql://*******:*******@localhost:5432/galaxy_prod
# If the server logs errors about not having enough database pool connections, # you will want to increase these values, or consider running more Galaxy # processes. database_engine_option_pool_size = 10 database_engine_option_max_overflow = 20
# If using MySQL and the server logs the error "MySQL server has gone away", # you will want to set this to some positive value (7200 should work). #database_engine_option_pool_recycle = -1
# If large database query results are causing memory or response time issues in # the Galaxy process, leave the result on the server instead. This option is # only available for PostgreSQL and is highly recommended. database_engine_option_server_side_cursors = True
# Create only one connection to the database per thread, to reduce the # connection overhead. Recommended when not using SQLite: database_engine_option_strategy = threadlocal
# Log all database transactions, can be useful for debugging and performance # profiling. Logging is done via Python's 'logging' module under the qualname # 'galaxy.model.orm.logging_connection_proxy' database_query_profiling_proxy = False
# -- Files and directories
Let me know if you think these settings are appropriate or need further tweaks.
Thanks again for your responses so far,
Richard
On 13 Jul 2015, at 16:31, Carl Eberhard <carlfeberhard@gmail.commailto:carlfeberhard@gmail.com> wrote:
Hi, Richard
How many histories are on your account? How many datasets (roughly)?
Are you using an Admin account to view the histories and does the slow down still occur for regular users with large amounts of data?
One of the exposed attributes of datasets (for admins - not other users generally) is the file_name. I've noticed that retrieving this attribute from the file system can be slow.
Christian also provides good advice.
On Thu, Jul 9, 2015 at 4:12 AM, Christian Brenninkmeijer <christian.brenninkmeijer@manchester.ac.ukmailto:christian.brenninkmeijer@manchester.ac.uk> wrote: Hi Richard,
I am relatively new to galaxy so if you get a different response from one of the core team ignore this.
One thing I would check is the underlying database. What do you have set for "database_connection" in your galaxy.ini file.
Especially if you are using the default sqlite this could be the issue. As that is store in a single file on disk.
Whichever database you have make sure it has enough resources to handle what will now be a large size.
Christian ________________________________ From: galaxy-dev [galaxy-dev-bounces@lists.galaxyproject.orgmailto:galaxy-dev-bounces@lists.galaxyproject.org] on behalf of Poole, Richard [r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk] Sent: Wednesday, July 08, 2015 9:04 PM To: galaxy-dev@lists.galaxyproject.orgmailto:galaxy-dev@lists.galaxyproject.org Subject: [galaxy-dev] Slow repsonses viewing histories
Hi all,
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
Thanks, Richard
Richard J Poole PhD Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577tel:%2B44%C2%A020%207679%206577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133tel:%2B44%2020%C2%A07679%206133 (int. 46133)
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Nate,
I have managed to update to the latest version of the default branch but I do not see the ‘max_metadate_value_size’ in my universe_wsgi.ini (this file was not changed during the update). Same thing if I use hg pull && hg update stable or hg pull && hg update release_15.05.
Does this mean this change is not yet on mercurial and if so how can I pull this specific change (using curl?) to generate my own branch specific for the patch?
Thanks, Richard
On 15 Jul 2015, at 20:59, Richard Poole <ucgarjp@live.ucl.ac.ukmailto:ucgarjp@live.ucl.ac.uk> wrote:
Just a little update - executed the SQL query and luckily only returned 20 BAM files - the most recently generated ones of course. Now makes sense this could well be the cause of the UI problem as my account became more and more unresponsive the more of these 20 BAM files I generated over the last week………..
Rich
On 15 Jul 2015, at 20:38, Richard Poole <ucgarjp@live.ucl.ac.ukmailto:ucgarjp@live.ucl.ac.uk> wrote:
Hi Nate,
Ok - I can try this but I’m not an SQL database expert - far from it in fact. That query will just list ‘problematic’ datasets? I have a lot of histories with BAM files ;)
Resetting the metadata needs to be done before this fix works? I ask because my history is so slow now as to be unusable…….or setting the cutoff allows Galaxy to be usable but it wipes out the metadata (which then needs resetting).
I am still using universe_wsgi.ini not galaxy.ini and I don’t see the 'max_metadata_value_size’ in any section of my universe_wsgi_ini file - I guess because I’m not on latest update. I am actually not sure what version I am on as to fix an earlier issue with shared datasets (that wasn’t yet on mercurial) on the advice of a few folks I generated a branch specific for the patch using:
hg branch fix_history_sharing
curl https://github.com/galaxyproject/galaxy/commit/62772bc86e2504982f207a982542c... -p1
Also not being a huge mecurial expert I am a little unsure now how to switch back to stable branch correctly and pull the latest updates. Could you advise (sorry to be a noob)?
Rich
On 15 Jul 2015, at 20:22, Nate Coraor <nate@bx.psu.edumailto:nate@bx.psu.edu> wrote:
Hi Richard,
Unfortunately, you will need to reset metadata for any problematic datasets once you have updated to the latest version of 15.05 and set a cutoff value. You can find the datasets with the following SQL query in your database:
select hda.idhttp://hda.id/, u.email, h.namehttp://h.name/, hda.hid, hda.namehttp://hda.name/, length(hda.metadata) from history_dataset_association hda join history h on hda.history_id=h.idhttp://h.id/ join galaxy_user u on h.user_id=u.idhttp://u.id/ where length(hda.metadata) > 1048576 order by length(hda.metadata) desc;
And you can reset metadata by clicking on the dataset's pencil icon in your history and clicking "auto-detect".
--nate
On Wed, Jul 15, 2015 at 3:16 PM, Poole, Richard <r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk> wrote: Hi Nate,
I am indeed using later than 15.05………so I will try this fix next time I can restart the server and let you know.
Richard
On 15 Jul 2015, at 19:32, Nate Coraor <nate@bx.psu.edumailto:nate@bx.psu.edu> wrote:
Hi Richard,
By any chance, are you running Galaxy 15.05 or later? 15.05 includes new metadata for bam files that can cause UI performance problems with certain types of bam files. This can be limited with the new `max_metadata_value_size` in galaxy.ini (on usegalaxy.orghttp://usegalaxy.org/ we've set it to 1000000).
I've also created a pull request to make this limiting the default: https://github.com/galaxyproject/galaxy/pull/466
However, if you are using an older version of Galaxy, this issue is not related to the problem you're experiencing.
--nate
On Wed, Jul 15, 2015 at 2:16 PM, Poole, Richard <r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk> wrote: Hi Christian and Carl,
Thanks both for the replies.
To answer your questions in reverse order. I have about XX histories in my account each with an average of about XX datasets. Total data in my account is about 1TB.
It is indeed an admin account and other users with close to 1TB of data do not have a similar slow down. Although their data is spread over far fewer histories. Is there a way then to prevent the file_name attribute being requested for admin accounts so I can see if this speeds things back up again?
Although the Galaxy server is running on my iMac the data is stored external on a large directly attached NAS. I think I first noticed this slow down after deleting and purging a bunch of older histories to free space on the NAS. I have tried running some of the cleanup_datasets scripts but they are actually returning errors and not running right now (can give you the error messages if necessary).
The slowdown is actually getting worse now and it is even slow to display tool pages, as well as often getting this error if it is really slow: Proxy Error
The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /history/listhttp://iworm.anat.ucl.ac.uk:8080/history/list.
Reason: Error reading from remote server
I am running through an apache proxy - perhaps the apache settings need tweaking too? (I forget right now where I set these up!).
As for the database itself, I am running PostgreSQL 9.3 and I tweaked the settings in my universe_wsgi.ini as per the instructions on https://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer#Adv...
So my settings are:
# -- Database
# By default, Galaxy uses a SQLite database at 'database/universe.sqlite'. You # may use a SQLAlchemy connection string to specify an external database # instead. This string takes many options which are explained in detail in the # config file documentation. database_connection = postgresql://*******:*******@localhost:5432/galaxy_prod
# If the server logs errors about not having enough database pool connections, # you will want to increase these values, or consider running more Galaxy # processes. database_engine_option_pool_size = 10 database_engine_option_max_overflow = 20
# If using MySQL and the server logs the error "MySQL server has gone away", # you will want to set this to some positive value (7200 should work). #database_engine_option_pool_recycle = -1
# If large database query results are causing memory or response time issues in # the Galaxy process, leave the result on the server instead. This option is # only available for PostgreSQL and is highly recommended. database_engine_option_server_side_cursors = True
# Create only one connection to the database per thread, to reduce the # connection overhead. Recommended when not using SQLite: database_engine_option_strategy = threadlocal
# Log all database transactions, can be useful for debugging and performance # profiling. Logging is done via Python's 'logging' module under the qualname # 'galaxy.model.orm.logging_connection_proxy' database_query_profiling_proxy = False
# -- Files and directories
Let me know if you think these settings are appropriate or need further tweaks.
Thanks again for your responses so far,
Richard
On 13 Jul 2015, at 16:31, Carl Eberhard <carlfeberhard@gmail.commailto:carlfeberhard@gmail.com> wrote:
Hi, Richard
How many histories are on your account? How many datasets (roughly)?
Are you using an Admin account to view the histories and does the slow down still occur for regular users with large amounts of data?
One of the exposed attributes of datasets (for admins - not other users generally) is the file_name. I've noticed that retrieving this attribute from the file system can be slow.
Christian also provides good advice.
On Thu, Jul 9, 2015 at 4:12 AM, Christian Brenninkmeijer <christian.brenninkmeijer@manchester.ac.ukmailto:christian.brenninkmeijer@manchester.ac.uk> wrote: Hi Richard,
I am relatively new to galaxy so if you get a different response from one of the core team ignore this.
One thing I would check is the underlying database. What do you have set for "database_connection" in your galaxy.ini file.
Especially if you are using the default sqlite this could be the issue. As that is store in a single file on disk.
Whichever database you have make sure it has enough resources to handle what will now be a large size.
Christian ________________________________ From: galaxy-dev [galaxy-dev-bounces@lists.galaxyproject.orgmailto:galaxy-dev-bounces@lists.galaxyproject.org] on behalf of Poole, Richard [r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk] Sent: Wednesday, July 08, 2015 9:04 PM To: galaxy-dev@lists.galaxyproject.orgmailto:galaxy-dev@lists.galaxyproject.org Subject: [galaxy-dev] Slow repsonses viewing histories
Hi all,
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
Thanks, Richard
Richard J Poole PhD Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577tel:%2B44%C2%A020%207679%206577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133tel:%2B44%2020%C2%A07679%206133 (int. 46133)
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Nate,
Problem solved (with one small exception) with much help from Marius. I followed his instructions to apply the three patches via mercurial:
You can get the diff of a pull request by adding .diff, like so:
https://patch-diff.githubusercontent.com/raw/galaxyproject/galaxy/pull/345.d... https://patch-diff.githubusercontent.com/raw/galaxyproject/galaxy/pull/416.d... https://patch-diff.githubusercontent.com/raw/galaxyproject/galaxy/pull/466.d...
These should be the 3 pull requests for the issue you referenced. So you should be able to do curl https://patch-diff.githubusercontent.com/raw/galaxyproject/galaxy/pull/345.d... |patch -p1
and then the other 2. It's probably a good idea to do another branch for testing this, e.g. hg branch fix_metadata hg commit -m "branch to test metadatafix"
I had to manually resolve one of the patches in lib/galaxy/tools/actions/metadata
I also had to manually resolve that one as the patch for that file didn’t apply properly.
I was then able to reset all metadata for the BAMs listed following the SQL search and hey presto history display back to normal speed :)
The one issue I still have is that 9 of the offending BAMs are actually in a history that I deleted permanently by accident during the UI slow-response. So they are still listed in the SQL search. This doesn’t seem to be adversely affecting the speed but obviously I can’t reset their metadata now. Is this a problem?
Thanks for the help, Rich
On 16 Jul 2015, at 11:45, Poole, Richard <r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk> wrote:
Hi Nate,
I have managed to update to the latest version of the default branch but I do not see the ‘max_metadate_value_size’ in my universe_wsgi.ini (this file was not changed during the update). Same thing if I use hg pull && hg update stable or hg pull && hg update release_15.05.
Does this mean this change is not yet on mercurial and if so how can I pull this specific change (using curl?) to generate my own branch specific for the patch?
Thanks, Richard
On 15 Jul 2015, at 20:59, Richard Poole <ucgarjp@live.ucl.ac.ukmailto:ucgarjp@live.ucl.ac.uk> wrote:
Just a little update - executed the SQL query and luckily only returned 20 BAM files - the most recently generated ones of course. Now makes sense this could well be the cause of the UI problem as my account became more and more unresponsive the more of these 20 BAM files I generated over the last week………..
Rich
On 15 Jul 2015, at 20:38, Richard Poole <ucgarjp@live.ucl.ac.ukmailto:ucgarjp@live.ucl.ac.uk> wrote:
Hi Nate,
Ok - I can try this but I’m not an SQL database expert - far from it in fact. That query will just list ‘problematic’ datasets? I have a lot of histories with BAM files ;)
Resetting the metadata needs to be done before this fix works? I ask because my history is so slow now as to be unusable…….or setting the cutoff allows Galaxy to be usable but it wipes out the metadata (which then needs resetting).
I am still using universe_wsgi.ini not galaxy.ini and I don’t see the 'max_metadata_value_size’ in any section of my universe_wsgi_ini file - I guess because I’m not on latest update. I am actually not sure what version I am on as to fix an earlier issue with shared datasets (that wasn’t yet on mercurial) on the advice of a few folks I generated a branch specific for the patch using:
hg branch fix_history_sharing
curl https://github.com/galaxyproject/galaxy/commit/62772bc86e2504982f207a982542c... -p1
Also not being a huge mecurial expert I am a little unsure now how to switch back to stable branch correctly and pull the latest updates. Could you advise (sorry to be a noob)?
Rich
On 15 Jul 2015, at 20:22, Nate Coraor <nate@bx.psu.edumailto:nate@bx.psu.edu> wrote:
Hi Richard,
Unfortunately, you will need to reset metadata for any problematic datasets once you have updated to the latest version of 15.05 and set a cutoff value. You can find the datasets with the following SQL query in your database:
select hda.idhttp://hda.id/, u.email, h.namehttp://h.name/, hda.hid, hda.namehttp://hda.name/, length(hda.metadata) from history_dataset_association hda join history h on hda.history_id=h.idhttp://h.id/ join galaxy_user u on h.user_id=u.idhttp://u.id/ where length(hda.metadata) > 1048576 order by length(hda.metadata) desc;
And you can reset metadata by clicking on the dataset's pencil icon in your history and clicking "auto-detect".
--nate
On Wed, Jul 15, 2015 at 3:16 PM, Poole, Richard <r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk> wrote: Hi Nate,
I am indeed using later than 15.05………so I will try this fix next time I can restart the server and let you know.
Richard
On 15 Jul 2015, at 19:32, Nate Coraor <nate@bx.psu.edumailto:nate@bx.psu.edu> wrote:
Hi Richard,
By any chance, are you running Galaxy 15.05 or later? 15.05 includes new metadata for bam files that can cause UI performance problems with certain types of bam files. This can be limited with the new `max_metadata_value_size` in galaxy.ini (on usegalaxy.orghttp://usegalaxy.org/ we've set it to 1000000).
I've also created a pull request to make this limiting the default: https://github.com/galaxyproject/galaxy/pull/466
However, if you are using an older version of Galaxy, this issue is not related to the problem you're experiencing.
--nate
On Wed, Jul 15, 2015 at 2:16 PM, Poole, Richard <r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk> wrote: Hi Christian and Carl,
Thanks both for the replies.
To answer your questions in reverse order. I have about XX histories in my account each with an average of about XX datasets. Total data in my account is about 1TB.
It is indeed an admin account and other users with close to 1TB of data do not have a similar slow down. Although their data is spread over far fewer histories. Is there a way then to prevent the file_name attribute being requested for admin accounts so I can see if this speeds things back up again?
Although the Galaxy server is running on my iMac the data is stored external on a large directly attached NAS. I think I first noticed this slow down after deleting and purging a bunch of older histories to free space on the NAS. I have tried running some of the cleanup_datasets scripts but they are actually returning errors and not running right now (can give you the error messages if necessary).
The slowdown is actually getting worse now and it is even slow to display tool pages, as well as often getting this error if it is really slow: Proxy Error
The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /history/listhttp://iworm.anat.ucl.ac.uk:8080/history/list.
Reason: Error reading from remote server
I am running through an apache proxy - perhaps the apache settings need tweaking too? (I forget right now where I set these up!).
As for the database itself, I am running PostgreSQL 9.3 and I tweaked the settings in my universe_wsgi.ini as per the instructions on https://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer#Adv...
So my settings are:
# -- Database
# By default, Galaxy uses a SQLite database at 'database/universe.sqlite'. You # may use a SQLAlchemy connection string to specify an external database # instead. This string takes many options which are explained in detail in the # config file documentation. database_connection = postgresql://*******:*******@localhost:5432/galaxy_prod
# If the server logs errors about not having enough database pool connections, # you will want to increase these values, or consider running more Galaxy # processes. database_engine_option_pool_size = 10 database_engine_option_max_overflow = 20
# If using MySQL and the server logs the error "MySQL server has gone away", # you will want to set this to some positive value (7200 should work). #database_engine_option_pool_recycle = -1
# If large database query results are causing memory or response time issues in # the Galaxy process, leave the result on the server instead. This option is # only available for PostgreSQL and is highly recommended. database_engine_option_server_side_cursors = True
# Create only one connection to the database per thread, to reduce the # connection overhead. Recommended when not using SQLite: database_engine_option_strategy = threadlocal
# Log all database transactions, can be useful for debugging and performance # profiling. Logging is done via Python's 'logging' module under the qualname # 'galaxy.model.orm.logging_connection_proxy' database_query_profiling_proxy = False
# -- Files and directories
Let me know if you think these settings are appropriate or need further tweaks.
Thanks again for your responses so far,
Richard
On 13 Jul 2015, at 16:31, Carl Eberhard <carlfeberhard@gmail.commailto:carlfeberhard@gmail.com> wrote:
Hi, Richard
How many histories are on your account? How many datasets (roughly)?
Are you using an Admin account to view the histories and does the slow down still occur for regular users with large amounts of data?
One of the exposed attributes of datasets (for admins - not other users generally) is the file_name. I've noticed that retrieving this attribute from the file system can be slow.
Christian also provides good advice.
On Thu, Jul 9, 2015 at 4:12 AM, Christian Brenninkmeijer <christian.brenninkmeijer@manchester.ac.ukmailto:christian.brenninkmeijer@manchester.ac.uk> wrote: Hi Richard,
I am relatively new to galaxy so if you get a different response from one of the core team ignore this.
One thing I would check is the underlying database. What do you have set for "database_connection" in your galaxy.ini file.
Especially if you are using the default sqlite this could be the issue. As that is store in a single file on disk.
Whichever database you have make sure it has enough resources to handle what will now be a large size.
Christian ________________________________ From: galaxy-dev [galaxy-dev-bounces@lists.galaxyproject.orgmailto:galaxy-dev-bounces@lists.galaxyproject.org] on behalf of Poole, Richard [r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk] Sent: Wednesday, July 08, 2015 9:04 PM To: galaxy-dev@lists.galaxyproject.orgmailto:galaxy-dev@lists.galaxyproject.org Subject: [galaxy-dev] Slow repsonses viewing histories
Hi all,
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
Thanks, Richard
Richard J Poole PhD Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577tel:%2B44%C2%A020%207679%206577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133tel:%2B44%2020%C2%A07679%206133 (int. 46133)
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Guys,
I started pulling in the patches named below. After applying patch 345 (curl ... | patch -p1), all new job submissions and finishing jobs fail with
/tsl/services/galaxy/dist/galaxy-dist/lib/galaxy/__init__.py:63: UserWarning: Module simplejson was already imported from /tsl/services/galaxy/dist/lib/python2.6/site-packages/simplejson-3.6.5-py2.6-linux-x86_64.egg/simplejson/__init__.pyc, but /tsl/services/galaxy/dist/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg is being added to sys.path self.check_version_conflict()
Any ideas?
Thanks Christian
On 16/07/15 19:36, Poole, Richard wrote: Hi Nate,
Problem solved (with one small exception) with much help from Marius. I followed his instructions to apply the three patches via mercurial:
You can get the diff of a pull request by adding .diff, like so:
https://patch-diff.githubusercontent.com/raw/galaxyproject/galaxy/pull/345.d... https://patch-diff.githubusercontent.com/raw/galaxyproject/galaxy/pull/416.d... https://patch-diff.githubusercontent.com/raw/galaxyproject/galaxy/pull/466.d...
These should be the 3 pull requests for the issue you referenced. So you should be able to do curl https://patch-diff.githubusercontent.com/raw/galaxyproject/galaxy/pull/345.d... |patch -p1
and then the other 2. It's probably a good idea to do another branch for testing this, e.g. hg branch fix_metadata hg commit -m "branch to test metadatafix"
I had to manually resolve one of the patches in lib/galaxy/tools/actions/metadata
I also had to manually resolve that one as the patch for that file didn’t apply properly.
I was then able to reset all metadata for the BAMs listed following the SQL search and hey presto history display back to normal speed :)
The one issue I still have is that 9 of the offending BAMs are actually in a history that I deleted permanently by accident during the UI slow-response. So they are still listed in the SQL search. This doesn’t seem to be adversely affecting the speed but obviously I can’t reset their metadata now. Is this a problem?
Thanks for the help, Rich
On 16 Jul 2015, at 11:45, Poole, Richard <r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk> wrote:
Hi Nate,
I have managed to update to the latest version of the default branch but I do not see the ‘max_metadate_value_size’ in my universe_wsgi.ini (this file was not changed during the update). Same thing if I use hg pull && hg update stable or hg pull && hg update release_15.05.
Does this mean this change is not yet on mercurial and if so how can I pull this specific change (using curl?) to generate my own branch specific for the patch?
Thanks, Richard
On 15 Jul 2015, at 20:59, Richard Poole <ucgarjp@live.ucl.ac.ukmailto:ucgarjp@live.ucl.ac.uk> wrote:
Just a little update - executed the SQL query and luckily only returned 20 BAM files - the most recently generated ones of course. Now makes sense this could well be the cause of the UI problem as my account became more and more unresponsive the more of these 20 BAM files I generated over the last week………..
Rich
On 15 Jul 2015, at 20:38, Richard Poole <ucgarjp@live.ucl.ac.ukmailto:ucgarjp@live.ucl.ac.uk> wrote:
Hi Nate,
Ok - I can try this but I’m not an SQL database expert - far from it in fact. That query will just list ‘problematic’ datasets? I have a lot of histories with BAM files ;)
Resetting the metadata needs to be done before this fix works? I ask because my history is so slow now as to be unusable…….or setting the cutoff allows Galaxy to be usable but it wipes out the metadata (which then needs resetting).
I am still using universe_wsgi.ini not galaxy.ini and I don’t see the 'max_metadata_value_size’ in any section of my universe_wsgi_ini file - I guess because I’m not on latest update. I am actually not sure what version I am on as to fix an earlier issue with shared datasets (that wasn’t yet on mercurial) on the advice of a few folks I generated a branch specific for the patch using:
hg branch fix_history_sharing
curl https://github.com/galaxyproject/galaxy/commit/62772bc86e2504982f207a982542c...https://github.com/galaxyproject/galaxy/commit/62772bc86e2504982f207a982542cbcc3faf0c65.diff%7Cpatch -p1
Also not being a huge mecurial expert I am a little unsure now how to switch back to stable branch correctly and pull the latest updates. Could you advise (sorry to be a noob)?
Rich
On 15 Jul 2015, at 20:22, Nate Coraor <nate@bx.psu.edumailto:nate@bx.psu.edu> wrote:
Hi Richard,
Unfortunately, you will need to reset metadata for any problematic datasets once you have updated to the latest version of 15.05 and set a cutoff value. You can find the datasets with the following SQL query in your database:
select hda.idhttp://hda.id/, u.email, h.namehttp://h.name/, hda.hid, hda.namehttp://hda.name/, length(hda.metadata) from history_dataset_association hda join history h on hda.history_id=h.idhttp://h.id/ join galaxy_user u on h.user_id=u.idhttp://u.id/ where length(hda.metadata) > 1048576 order by length(hda.metadata) desc;
And you can reset metadata by clicking on the dataset's pencil icon in your history and clicking "auto-detect".
--nate
On Wed, Jul 15, 2015 at 3:16 PM, Poole, Richard <r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk> wrote: Hi Nate,
I am indeed using later than 15.05………so I will try this fix next time I can restart the server and let you know.
Richard
On 15 Jul 2015, at 19:32, Nate Coraor <nate@bx.psu.edumailto:nate@bx.psu.edu> wrote:
Hi Richard,
By any chance, are you running Galaxy 15.05 or later? 15.05 includes new metadata for bam files that can cause UI performance problems with certain types of bam files. This can be limited with the new `max_metadata_value_size` in galaxy.ini (on usegalaxy.orghttp://usegalaxy.org/ we've set it to 1000000).
I've also created a pull request to make this limiting the default: https://github.com/galaxyproject/galaxy/pull/466
However, if you are using an older version of Galaxy, this issue is not related to the problem you're experiencing.
--nate
On Wed, Jul 15, 2015 at 2:16 PM, Poole, Richard <r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk> wrote: Hi Christian and Carl,
Thanks both for the replies.
To answer your questions in reverse order. I have about XX histories in my account each with an average of about XX datasets. Total data in my account is about 1TB.
It is indeed an admin account and other users with close to 1TB of data do not have a similar slow down. Although their data is spread over far fewer histories. Is there a way then to prevent the file_name attribute being requested for admin accounts so I can see if this speeds things back up again?
Although the Galaxy server is running on my iMac the data is stored external on a large directly attached NAS. I think I first noticed this slow down after deleting and purging a bunch of older histories to free space on the NAS. I have tried running some of the cleanup_datasets scripts but they are actually returning errors and not running right now (can give you the error messages if necessary).
The slowdown is actually getting worse now and it is even slow to display tool pages, as well as often getting this error if it is really slow: Proxy Error
The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /history/listhttp://iworm.anat.ucl.ac.uk:8080/history/list.
Reason: Error reading from remote server
I am running through an apache proxy - perhaps the apache settings need tweaking too? (I forget right now where I set these up!).
As for the database itself, I am running PostgreSQL 9.3 and I tweaked the settings in my universe_wsgi.ini as per the instructions on https://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer#Adv...
So my settings are:
# -- Database
# By default, Galaxy uses a SQLite database at 'database/universe.sqlite'. You # may use a SQLAlchemy connection string to specify an external database # instead. This string takes many options which are explained in detail in the # config file documentation. database_connection = postgresql://*******:*******@localhost:5432/galaxy_prod
# If the server logs errors about not having enough database pool connections, # you will want to increase these values, or consider running more Galaxy # processes. database_engine_option_pool_size = 10 database_engine_option_max_overflow = 20
# If using MySQL and the server logs the error "MySQL server has gone away", # you will want to set this to some positive value (7200 should work). #database_engine_option_pool_recycle = -1
# If large database query results are causing memory or response time issues in # the Galaxy process, leave the result on the server instead. This option is # only available for PostgreSQL and is highly recommended. database_engine_option_server_side_cursors = True
# Create only one connection to the database per thread, to reduce the # connection overhead. Recommended when not using SQLite: database_engine_option_strategy = threadlocal
# Log all database transactions, can be useful for debugging and performance # profiling. Logging is done via Python's 'logging' module under the qualname # 'galaxy.model.orm.logging_connection_proxy' database_query_profiling_proxy = False
# -- Files and directories
Let me know if you think these settings are appropriate or need further tweaks.
Thanks again for your responses so far,
Richard
On 13 Jul 2015, at 16:31, Carl Eberhard <carlfeberhard@gmail.commailto:carlfeberhard@gmail.com> wrote:
Hi, Richard
How many histories are on your account? How many datasets (roughly)?
Are you using an Admin account to view the histories and does the slow down still occur for regular users with large amounts of data?
One of the exposed attributes of datasets (for admins - not other users generally) is the file_name. I've noticed that retrieving this attribute from the file system can be slow.
Christian also provides good advice.
On Thu, Jul 9, 2015 at 4:12 AM, Christian Brenninkmeijer <christian.brenninkmeijer@manchester.ac.ukmailto:christian.brenninkmeijer@manchester.ac.uk> wrote: Hi Richard,
I am relatively new to galaxy so if you get a different response from one of the core team ignore this.
One thing I would check is the underlying database. What do you have set for "database_connection" in your galaxy.ini file.
Especially if you are using the default sqlite this could be the issue. As that is store in a single file on disk.
Whichever database you have make sure it has enough resources to handle what will now be a large size.
Christian ________________________________ From: galaxy-dev [galaxy-dev-bounces@lists.galaxyproject.orgmailto:galaxy-dev-bounces@lists.galaxyproject.org] on behalf of Poole, Richard [r.poole@ucl.ac.ukmailto:r.poole@ucl.ac.uk] Sent: Wednesday, July 08, 2015 9:04 PM To: galaxy-dev@lists.galaxyproject.orgmailto:galaxy-dev@lists.galaxyproject.org Subject: [galaxy-dev] Slow repsonses viewing histories
Hi all,
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
Thanks, Richard
Richard J Poole PhD Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577tel:%2B44%C2%A020%207679%206577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133tel:%2B44%2020%C2%A07679%206133 (int. 46133)
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Christian,
I'd suggest creating a virtualenv and starting Galaxy from that.
--nate
On Wed, Aug 12, 2015 at 8:24 AM, Christian Schudoma (TSL) < Christian.Schudoma@sainsbury-laboratory.ac.uk> wrote:
Hi Guys,
I started pulling in the patches named below. After applying patch 345 (curl ... | patch -p1), all new job submissions and finishing jobs fail with
/tsl/services/galaxy/dist/galaxy-dist/lib/galaxy/__init__.py:63: UserWarning: Module simplejson was already imported from /tsl/services/galaxy/dist/lib/python2.6/site-packages/simplejson-3.6.5-py2.6-linux-x86_64.egg/simplejson/__init__.pyc, but /tsl/services/galaxy/dist/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg is being added to sys.path self.check_version_conflict()
Any ideas?
Thanks Christian
On 16/07/15 19:36, Poole, Richard wrote:
Hi Nate,
Problem solved (with one small exception) with much help from Marius. I followed his instructions to apply the three patches via mercurial:
You can get the diff of a pull request by adding .diff, like so:
https://patch-diff.githubusercontent.com/raw/galaxyproject/galaxy/pull/345.d...
https://patch-diff.githubusercontent.com/raw/galaxyproject/galaxy/pull/416.d...
https://patch-diff.githubusercontent.com/raw/galaxyproject/galaxy/pull/466.d...
These should be the 3 pull requests for the issue you referenced. So you should be able to do curl https://patch-diff.githubusercontent.com/raw/galaxyproject/galaxy/pull/345.d... |patch -p1
and then the other 2. It's probably a good idea to do another branch for testing this, e.g. hg branch fix_metadata hg commit -m "branch to test metadatafix"
I had to manually resolve one of the patches in lib/galaxy/tools/actions/metadata
I also had to manually resolve that one as the patch for that file didn’t apply properly.
I was then able to reset all metadata for the BAMs listed following the SQL search and hey presto history display back to normal speed :)
The one issue I still have is that 9 of the offending BAMs are actually in a history that I deleted permanently by accident during the UI slow-response. So they are still listed in the SQL search. This doesn’t seem to be adversely affecting the speed but obviously I can’t reset their metadata now. Is this a problem?
Thanks for the help, Rich
On 16 Jul 2015, at 11:45, Poole, Richard r.poole@ucl.ac.uk wrote:
Hi Nate,
I have managed to update to the latest version of the default branch but I do not see the ‘max_metadate_value_size’ in my universe_wsgi.ini (this file was not changed during the update). Same thing if I use hg pull && hg update stable or hg pull && hg update release_15.05.
Does this mean this change is not yet on mercurial and if so how can I pull this specific change (using curl?) to generate my own branch specific for the patch?
Thanks, Richard
On 15 Jul 2015, at 20:59, Richard Poole ucgarjp@live.ucl.ac.uk wrote:
Just a little update - executed the SQL query and luckily only returned 20 BAM files - the most recently generated ones of course. Now makes sense this could well be the cause of the UI problem as my account became more and more unresponsive the more of these 20 BAM files I generated over the last week………..
Rich
On 15 Jul 2015, at 20:38, Richard Poole ucgarjp@live.ucl.ac.uk wrote:
Hi Nate,
Ok - I can try this but I’m not an SQL database expert - far from it in fact. That query will just list ‘problematic’ datasets? I have a lot of histories with BAM files ;)
Resetting the metadata needs to be done before this fix works? I ask because my history is so slow now as to be unusable…….or setting the cutoff allows Galaxy to be usable but it wipes out the metadata (which then needs resetting).
I am still using universe_wsgi.ini not galaxy.ini and I don’t see the 'max_metadata_value_size’ in any section of my universe_wsgi_ini file - I guess because I’m not on latest update. I am actually not sure what version I am on as to fix an earlier issue with shared datasets (that wasn’t yet on mercurial) on the advice of a few folks I generated a branch specific for the patch using:
hg branch fix_history_sharing
curl https://github.com/galaxyproject/galaxy/commit/62772bc86e2504982f207a982542c... -p1
Also not being a huge mecurial expert I am a little unsure now how to switch back to stable branch correctly and pull the latest updates. Could you advise (sorry to be a noob)?
Rich
On 15 Jul 2015, at 20:22, Nate Coraor nate@bx.psu.edu wrote:
Hi Richard,
Unfortunately, you will need to reset metadata for any problematic datasets once you have updated to the latest version of 15.05 and set a cutoff value. You can find the datasets with the following SQL query in your database:
select hda.id, u.email, h.name, hda.hid, hda.name, length(hda.metadata) from history_dataset_association hda join history h on hda.history_id=h.id join galaxy_user u on h.user_id=u.id where length(hda.metadata) > 1048576 order by length(hda.metadata) desc;
And you can reset metadata by clicking on the dataset's pencil icon in your history and clicking "auto-detect".
--nate
On Wed, Jul 15, 2015 at 3:16 PM, Poole, Richard r.poole@ucl.ac.uk wrote:
Hi Nate,
I am indeed using later than 15.05………so I will try this fix next time I can restart the server and let you know.
Richard
On 15 Jul 2015, at 19:32, Nate Coraor nate@bx.psu.edu wrote:
Hi Richard,
By any chance, are you running Galaxy 15.05 or later? 15.05 includes new metadata for bam files that can cause UI performance problems with certain types of bam files. This can be limited with the new `max_metadata_value_size` in galaxy.ini (on usegalaxy.org we've set it to 1000000).
I've also created a pull request to make this limiting the default: https://github.com/galaxyproject/galaxy/pull/466
However, if you are using an older version of Galaxy, this issue is not related to the problem you're experiencing.
--nate
On Wed, Jul 15, 2015 at 2:16 PM, Poole, Richard r.poole@ucl.ac.uk wrote:
Hi Christian and Carl,
Thanks both for the replies.
To answer your questions in reverse order. I have about XX histories in my account each with an average of about XX datasets. Total data in my account is about 1TB.
It is indeed an admin account and other users with close to 1TB of data do not have a similar slow down. Although their data is spread over far fewer histories. Is there a way then to prevent the file_name attribute being requested for admin accounts so I can see if this speeds things back up again?
Although the Galaxy server is running on my iMac the data is stored external on a large directly attached NAS. I think I first noticed this slow down after deleting and purging a bunch of older histories to free space on the NAS. I have tried running some of the cleanup_datasets scripts but they are actually returning errors and not running right now (can give you the error messages if necessary).
The slowdown is actually getting worse now and it is even slow to display tool pages, as well as often getting this error if it is really slow: Proxy Error
The proxy server received an invalid response from an upstream server. The proxy server could not handle the request *GET /history/list http://iworm.anat.ucl.ac.uk:8080/history/list*.
Reason: *Error reading from remote server* I am running through an apache proxy - perhaps the apache settings need tweaking too? (I forget right now where I set these up!).
As for the database itself, I am running PostgreSQL 9.3 and I tweaked the settings in my universe_wsgi.ini as per the instructions on https://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer#Adv...
So my settings are:
# -- Database
# By default, Galaxy uses a SQLite database at 'database/universe.sqlite'. You # may use a SQLAlchemy connection string to specify an external database # instead. This string takes many options which are explained in detail in the # config file documentation. database_connection = postgresql://*******:*******@localhost:5432/galaxy_prod
# If the server logs errors about not having enough database pool connections, # you will want to increase these values, or consider running more Galaxy # processes. database_engine_option_pool_size = 10 database_engine_option_max_overflow = 20
# If using MySQL and the server logs the error "MySQL server has gone away", # you will want to set this to some positive value (7200 should work). #database_engine_option_pool_recycle = -1
# If large database query results are causing memory or response time issues in # the Galaxy process, leave the result on the server instead. This option is # only available for PostgreSQL and is highly recommended. database_engine_option_server_side_cursors = True
# Create only one connection to the database per thread, to reduce the # connection overhead. Recommended when not using SQLite: database_engine_option_strategy = threadlocal
# Log all database transactions, can be useful for debugging and performance # profiling. Logging is done via Python's 'logging' module under the qualname # 'galaxy.model.orm.logging_connection_proxy' database_query_profiling_proxy = False
# -- Files and directories
Let me know if you think these settings are appropriate or need further tweaks.
Thanks again for your responses so far,
Richard
On 13 Jul 2015, at 16:31, Carl Eberhard carlfeberhard@gmail.com wrote:
Hi, Richard
How many histories are on your account? How many datasets (roughly)?
Are you using an Admin account to view the histories and does the slow down still occur for regular users with large amounts of data?
One of the exposed attributes of datasets (for admins - not other users generally) is the file_name. I've noticed that retrieving this attribute from the file system can be slow.
Christian also provides good advice.
On Thu, Jul 9, 2015 at 4:12 AM, Christian Brenninkmeijer < christian.brenninkmeijer@manchester.ac.uk> wrote:
Hi Richard,
I am relatively new to galaxy so if you get a different response from one of the core team ignore this.
One thing I would check is the underlying database. What do you have set for "database_connection" in your galaxy.ini file.
Especially if you are using the default sqlite this could be the issue. As that is store in a single file on disk.
Whichever database you have make sure it has enough resources to handle what will now be a large size.
Christian
*From:* galaxy-dev [galaxy-dev-bounces@lists.galaxyproject.org] on behalf of Poole, Richard [r.poole@ucl.ac.uk] *Sent:* Wednesday, July 08, 2015 9:04 PM *To:* galaxy-dev@lists.galaxyproject.org *Subject:* [galaxy-dev] Slow repsonses viewing histories
Hi all,
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
Thanks, Richard
*Richard J Poole PhD* Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133 (int. 46133)
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Nate, all
I would like to add my own version of the problem to this.
I, or rather my (inherited) Galaxy instance (which should be running 15.05+, how do I find out?), have/s similar symptoms as Richard's did. In my case it only occurs to a single user and we are not able to execute the "Saved Histories" command any more. Galaxy will fail with
Proxy Error
The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /history/list.
Reason: Error reading from remote server
Due to some problem with the psql-client I cannot find the potential perpetrators (my user has lots of bam files in various histories) and since he cannot switch histories, we cannot even update the metadata for his datasets. Is there a possibility to do that via the API?
Any insight/help would be greatly appreciated!
Cheers Christian
On 08/07/15 21:04, Poole, Richard wrote:
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
-- Dr. Christian Schudoma Bioinformatics Support Officer
Bioinformatics Group The Sainsbury Laboratory Norwich Research Park Norwich NR4 7UH United Kingdom ****************************** +44 (0) 1603 450 601 christian.schudoma@tsl.ac.ukmailto:christian.schudoma@tsl.ac.uk
Hey Christian,
I used <galaxy_dir>/scripts/db_shell.py and something like https://gist.github.com/dannon/e71b7aa9546fcecf6e9e to reset problematic BAM metadata. Basically, it selects all the large metadata and completely strips out the new overly large optional attributes, leaving required attributes alone. Once this is done, you can redetect metadata if you'd like (with the limits we'd talked about set) and it should all work fine.
Let me know if you need more info, hopefully this helps!
-Dannon
On Fri, Jul 24, 2015 at 11:23 AM Christian Schudoma (TSL) < Christian.Schudoma@sainsbury-laboratory.ac.uk> wrote:
Hi Nate, all
I would like to add my own version of the problem to this.
I, or rather my (inherited) Galaxy instance (which should be running 15.05+, how do I find out?), have/s similar symptoms as Richard's did. In my case it only occurs to a single user and we are not able to execute the "Saved Histories" command any more. Galaxy will fail with
Proxy Error
The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /history/list.
Reason: Error reading from remote server
Due to some problem with the psql-client I cannot find the potential perpetrators (my user has lots of bam files in various histories) and since he cannot switch histories, we cannot even update the metadata for his datasets. Is there a possibility to do that via the API?
Any insight/help would be greatly appreciated!
Cheers Christian
On 08/07/15 21:04, Poole, Richard wrote:
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
-- Dr. Christian Schudoma Bioinformatics Support Officer
Bioinformatics Group The Sainsbury Laboratory Norwich Research Park Norwich NR4 7UH United Kingdom
+44 (0) 1603 450 601 christian.schudoma@tsl.ac.uk
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Yay, thank you so much Dannon! That was the jackpot! My user was also quite happy.
Remind me to buy you a beer at the next GCC ;)
Cheers Christian
On 24/07/15 17:50, Dannon Baker wrote: Hey Christian,
I used <galaxy_dir>/scripts/db_shell.py and something like https://gist.github.com/dannon/e71b7aa9546fcecf6e9e to reset problematic BAM metadata. Basically, it selects all the large metadata and completely strips out the new overly large optional attributes, leaving required attributes alone. Once this is done, you can redetect metadata if you'd like (with the limits we'd talked about set) and it should all work fine.
Let me know if you need more info, hopefully this helps!
-Dannon
On Fri, Jul 24, 2015 at 11:23 AM Christian Schudoma (TSL) <Christian.Schudoma@sainsbury-laboratory.ac.ukmailto:Christian.Schudoma@sainsbury-laboratory.ac.uk> wrote: Hi Nate, all
I would like to add my own version of the problem to this.
I, or rather my (inherited) Galaxy instance (which should be running 15.05+, how do I find out?), have/s similar symptoms as Richard's did. In my case it only occurs to a single user and we are not able to execute the "Saved Histories" command any more. Galaxy will fail with
Proxy Error
The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /history/list.
Reason: Error reading from remote server
Due to some problem with the psql-client I cannot find the potential perpetrators (my user has lots of bam files in various histories) and since he cannot switch histories, we cannot even update the metadata for his datasets. Is there a possibility to do that via the API?
Any insight/help would be greatly appreciated!
Cheers Christian
On 08/07/15 21:04, Poole, Richard wrote:
I am having trouble right now with my own personal account on my production server. Grid refreshes are taking a huge amount of time (e.g. when viewing ‘saved histories’ or even generating the dataset list for a single history). My account is very full of data (1TB), could it be this?
There are no obvious messages in the logs though so I am a bit stumped as to why.I do not have the same trouble when impersonating other users with fairly full accounts. Perhaps a database issue (I do not know how to ‘cleanup’ the database or indeed Galaxy user accounts). Any thoughts?
-- Dr. Christian Schudoma Bioinformatics Support Officer
Bioinformatics Group The Sainsbury Laboratory Norwich Research Park Norwich NR4 7UH United Kingdom ****************************** +44 (0) 1603 450 601 christian.schudoma@tsl.ac.ukmailto:christian.schudoma@tsl.ac.uk
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
galaxy-dev@lists.galaxyproject.org