Using mysql instead of sqlite
Hello, I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file: #database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20 Could you point out to some doc or briefly describe what I need to do in order to go for mysql? Are there any plans to support other DBMS's (like Oracle for instance)? thanks, Erick
Erick, To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not. The config setting is the "database_connection" setting, and could look something like this: database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock Greg Von Kuster Galaxy Development Team Erick Antezana wrote:
Hello,
I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file:
#database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20
Could you point out to some doc or briefly describe what I need to do in order to go for mysql?
Are there any plans to support other DBMS's (like Oracle for instance)?
thanks, Erick ------------------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Greg, I manage to set my connection string so that we could use a remote mysql server. Thanks. w.r.t. the datasets purging, I used the scripts to clean deleted libraries, folders, datasets, userless history ... I've seen that one must speficy the span of time in days. What about the data that was added mistakenly for instance today and that we want to immediately delete it? I tried to launch the script with "-d 0" but the data is still there... Am I missing something? I am no longer using the sqlite DB created in our first trials. I guess I can safely delete (from the command line) all the files under the directory database? have the purge_*.sh scripts tested with mysql? last question (already asked before): are there any plans to support Oracle? thanks, Erick 2009/7/22 Greg Von Kuster <ghv2@psu.edu>
Erick,
To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not.
The config setting is the "database_connection" setting, and could look something like this:
database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock
Greg Von Kuster Galaxy Development Team
Erick Antezana wrote:
Hello,
I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file:
#database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20
Could you point out to some doc or briefly describe what I need to do in order to go for mysql?
Are there any plans to support other DBMS's (like Oracle for instance)?
thanks, Erick ------------------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Hi Erick, Erick Antezana wrote:
Greg,
I manage to set my connection string so that we could use a remote mysql server. Thanks.
w.r.t. the datasets purging, I used the scripts to clean deleted libraries, folders, datasets, userless history ... I've seen that one must speficy the span of time in days. What about the data that was added mistakenly for instance today and that we want to immediately delete it? I tried to launch the script with "-d 0" but the data is still there... Am I missing something?
No, I don't think so. It's possible that your system clock is off from your database time. Is your database storing time as local time? The cleanup script uses the update_time for the objects being deleted.
I am no longer using the sqlite DB created in our first trials. I guess I can safely delete (from the command line) all the files under the directory database?
Maybe. Did you keep any data that refers to them in your tables when you migrated to mysql? If so, you'll need to keep them.
have the purge_*.sh scripts tested with mysql?
Yes
last question (already asked before): are there any plans to support Oracle?
Not sure why it wouldn't already be supported, although we don't use it here. Just needs a different URL - sqlalchemy supports Oracle.
thanks, Erick
2009/7/22 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu>>
Erick,
To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not.
The config setting is the "database_connection" setting, and could look something like this:
database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock
Greg Von Kuster Galaxy Development Team
Erick Antezana wrote:
Hello,
I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file:
#database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20
Could you point out to some doc or briefly describe what I need to do in order to go for mysql?
Are there any plans to support other DBMS's (like Oracle for instance)?
thanks, Erick ------------------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu> http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Greg, please see in-line: 2009/7/23 Greg Von Kuster <ghv2@psu.edu>
Hi Erick,
Erick Antezana wrote:
Greg,
I manage to set my connection string so that we could use a remote mysql server. Thanks.
w.r.t. the datasets purging, I used the scripts to clean deleted libraries, folders, datasets, userless history ... I've seen that one must speficy the span of time in days. What about the data that was added mistakenly for instance today and that we want to immediately delete it? I tried to launch the script with "-d 0" but the data is still there... Am I missing something?
No, I don't think so. It's possible that your system clock is off from your database time.
both servers (mysql and the one where galaxy is running) have the same time.
Is your database storing time as local time?
how can I see that?
The cleanup script uses the update_time for the objects being deleted.
In which file can I find the SQL command that actually deletes and purges the data?
I am no longer using the sqlite DB created in our first trials. I guess I can safely delete (from the command line) all the files under the directory database?
Maybe. Did you keep any data that refers to them in your tables when you migrated to mysql? If so, you'll need to keep them.
no, I have no data referring to anything... I just deleted (to save space) all those files and I have no problems at all (so far ;-) )
have the purge_*.sh scripts tested with mysql?
Yes
last question (already asked before): are there any plans to support Oracle?
Not sure why it wouldn't already be supported, although we don't use it here. Just needs a different URL - sqlalchemy supports Oracle.
good to know that, I will try to find some time to test it and let you know. cheers, Erick
thanks, Erick
2009/7/22 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu>>
Erick,
To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not.
The config setting is the "database_connection" setting, and could look something like this:
database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock
Greg Von Kuster Galaxy Development Team
Erick Antezana wrote:
Hello,
I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file:
#database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20
Could you point out to some doc or briefly describe what I need to do in order to go for mysql?
Are there any plans to support other DBMS's (like Oracle for instance)?
thanks, Erick
------------------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu> http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Erick Antezana wrote:
Greg,
please see in-line:
2009/7/23 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu>>
Hi Erick,
Erick Antezana wrote:
Greg,
I manage to set my connection string so that we could use a remote mysql server. Thanks.
w.r.t. the datasets purging, I used the scripts to clean deleted libraries, folders, datasets, userless history ... I've seen that one must speficy the span of time in days. What about the data that was added mistakenly for instance today and that we want to immediately delete it? I tried to launch the script with "-d 0" but the data is still there... Am I missing something?
No, I don't think so. It's possible that your system clock is off from your database time.
both servers (mysql and the one where galaxy is running) have the same time.
Is your database storing time as local time?
how can I see that?
We use Python's datetime.utcnow() method to set the update_time on table rows in the database, but that may differ from your system's local time ( not sure ).
The cleanup script uses the update_time for the objects being deleted.
In which file can I find the SQL command that actually deletes and purges the data?
~/scripts/cleanup_datasets/cleanup_datasets.py, the various methods that delete and purge histories, libraries and datasets. Raw sql commands are not here because we're using sqlalchemy ( in order to transparently support multiple databases ).
I am no longer using the sqlite DB created in our first trials. I guess I can safely delete (from the command line) all the files under the directory database?
Maybe. Did you keep any data that refers to them in your tables when you migrated to mysql? If so, you'll need to keep them.
no, I have no data referring to anything... I just deleted (to save space) all those files and I have no problems at all (so far ;-) )
have the purge_*.sh scripts tested with mysql?
Yes
last question (already asked before): are there any plans to support Oracle?
Not sure why it wouldn't already be supported, although we don't use it here. Just needs a different URL - sqlalchemy supports Oracle.
good to know that, I will try to find some time to test it and let you know.
Thanks Erick!
cheers, Erick
thanks, Erick
2009/7/22 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu> <mailto:ghv2@psu.edu <mailto:ghv2@psu.edu>>>
Erick,
To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not.
The config setting is the "database_connection" setting, and could look something like this:
database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock
Greg Von Kuster Galaxy Development Team
Erick Antezana wrote:
Hello,
I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file:
#database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20
Could you point out to some doc or briefly describe what I need to do in order to go for mysql?
Are there any plans to support other DBMS's (like Oracle for instance)?
thanks, Erick
------------------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu> <mailto:galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu>>
Hi Erick, Greg et alia, I've setup Galaxy with a MySQL DB too, but I cannot get rid off old stuff. According to the wiki, running the script with ... -1 or -3 or -5 should show me what the script would do with -2, -4 or -6. When I ran with -1 it told me: -------- # 2009-07-29 14:03:22 - Handling stuff older than 1 days # Datasets will NOT be removed from disk. # The following datasets and associated userless histories have been deleted # Deleted 0 histories. Elapsed time: 0.21 -------- That was I bit weird, because I know there should be stuff to delete. So I tried my luck with -2 to perform the actual cleanup and viola: -------- # 2009-07-29 14:04:25 - Handling stuff older than 1 days # Datasets will NOT be removed from disk. # The following datasets and associated deleted histories have been purged 1 4 5 6 7 8 9 10 11 12 13 14 <..cut a lot of white space..> 15 16 # Purged 14 histories. Elapsed time: 1.17 -------- Running with -3, -4 and -5 all gave me 0 in either purged data sets or folders, but I know there must be stuff associated with user accounts older than 1 day that should be purged... The -6 option does not seem to work at all as I got this error: "cleanup_datasets.py: error: no such option: -6". Do I miss something? In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete? 2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept-I- know-what-I'm-doing-Delete outdated stuff :) ? Cheers, Pi On 23•Jul•2009, at 5:17 PM, Erick Antezana wrote:
Greg,
please see in-line:
2009/7/23 Greg Von Kuster <ghv2@psu.edu> Hi Erick,
Erick Antezana wrote: Greg,
I manage to set my connection string so that we could use a remote mysql server. Thanks.
w.r.t. the datasets purging, I used the scripts to clean deleted libraries, folders, datasets, userless history ... I've seen that one must speficy the span of time in days. What about the data that was added mistakenly for instance today and that we want to immediately delete it? I tried to launch the script with "-d 0" but the data is still there... Am I missing something?
No, I don't think so. It's possible that your system clock is off from your database time.
both servers (mysql and the one where galaxy is running) have the same time.
Is your database storing time as local time?
how can I see that?
The cleanup script uses the update_time for the objects being deleted.
In which file can I find the SQL command that actually deletes and purges the data?
I am no longer using the sqlite DB created in our first trials. I guess I can safely delete (from the command line) all the files under the directory database?
Maybe. Did you keep any data that refers to them in your tables when you migrated to mysql? If so, you'll need to keep them.
no, I have no data referring to anything... I just deleted (to save space) all those files and I have no problems at all (so far ;-) )
have the purge_*.sh scripts tested with mysql?
Yes
last question (already asked before): are there any plans to support Oracle?
Not sure why it wouldn't already be supported, although we don't use it here. Just needs a different URL - sqlalchemy supports Oracle.
good to know that, I will try to find some time to test it and let you know.
cheers, Erick
thanks, Erick
2009/7/22 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu>>
Erick,
To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not.
The config setting is the "database_connection" setting, and could look something like this:
database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock
Greg Von Kuster Galaxy Development Team
Erick Antezana wrote:
Hello,
I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file:
#database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20
Could you point out to some doc or briefly describe what I need to do in order to go for mysql?
Are there any plans to support other DBMS's (like Oracle for instance)?
thanks, Erick
------------------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu>
http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
Hi Pi, The wiki for deleting datasets is out of date, and I will be updating it shortly. There is a collection of shell scripts included in the scripts/cleanup_datasets directory. In order to delete no longer needed datasets from disk, the scripts can be used in the following order (assuming you have not used library functions): delete_userless_histories.sh purge_histories.sh purge_datasets.sh I will send a message after the wiki has been updated.
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete?
This is an excellent feature for us to add to the script.
2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept-I- know-what-I'm-doing-Delete outdated stuff :) ?
There are several database tables which Galaxy expects to exist (for Job reporting, etc.) and should not have entries deleted. Datasets are an example of this, when a Dataset is purged, the purged flag is set to True, but the entry is kept. Deleting entries from the dataset tables is not recommended. Thanks for using Galaxy, Dan
Hi Erick, Greg et alia,
I've setup Galaxy with a MySQL DB too, but I cannot get rid off old stuff. According to the wiki, running the script with ... -1 or -3 or -5 should show me what the script would do with -2, -4 or -6. When I ran with -1 it told me:
-------- # 2009-07-29 14:03:22 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated userless histories have been deleted # Deleted 0 histories.
Elapsed time: 0.21 --------
That was I bit weird, because I know there should be stuff to delete. So I tried my luck with -2 to perform the actual cleanup and viola:
-------- # 2009-07-29 14:04:25 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated deleted histories have been purged 1 4 5 6 7 8 9 10 11 12 13 14
<..cut a lot of white space..>
15 16 # Purged 14 histories.
Elapsed time: 1.17 --------
Running with -3, -4 and -5 all gave me 0 in either purged data sets or folders, but I know there must be stuff associated with user accounts older than 1 day that should be purged... The -6 option does not seem to work at all as I got this error: "cleanup_datasets.py: error: no such option: -6". Do I miss something?
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete? 2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept-I- know-what-I'm-doing-Delete outdated stuff :) ?
Cheers,
Pi
On 23Jul2009, at 5:17 PM, Erick Antezana wrote:
Greg,
please see in-line:
2009/7/23 Greg Von Kuster <ghv2@psu.edu> Hi Erick,
Erick Antezana wrote: Greg,
I manage to set my connection string so that we could use a remote mysql server. Thanks.
w.r.t. the datasets purging, I used the scripts to clean deleted libraries, folders, datasets, userless history ... I've seen that one must speficy the span of time in days. What about the data that was added mistakenly for instance today and that we want to immediately delete it? I tried to launch the script with "-d 0" but the data is still there... Am I missing something?
No, I don't think so. It's possible that your system clock is off from your database time.
both servers (mysql and the one where galaxy is running) have the same time.
Is your database storing time as local time?
how can I see that?
The cleanup script uses the update_time for the objects being deleted.
In which file can I find the SQL command that actually deletes and purges the data?
I am no longer using the sqlite DB created in our first trials. I guess I can safely delete (from the command line) all the files under the directory database?
Maybe. Did you keep any data that refers to them in your tables when you migrated to mysql? If so, you'll need to keep them.
no, I have no data referring to anything... I just deleted (to save space) all those files and I have no problems at all (so far ;-) )
have the purge_*.sh scripts tested with mysql?
Yes
last question (already asked before): are there any plans to support Oracle?
Not sure why it wouldn't already be supported, although we don't use it here. Just needs a different URL - sqlalchemy supports Oracle.
good to know that, I will try to find some time to test it and let you know.
cheers, Erick
thanks, Erick
2009/7/22 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu>>
Erick,
To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not.
The config setting is the "database_connection" setting, and could look something like this:
database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock
Greg Von Kuster Galaxy Development Team
Erick Antezana wrote:
Hello,
I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file:
#database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20
Could you point out to some doc or briefly describe what I need to do in order to go for mysql?
Are there any plans to support other DBMS's (like Oracle for instance)?
thanks, Erick
------------------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu>
http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Hi Dan, On 29•Jul•2009, at 4:50 PM, Daniel Blankenberg wrote:
Hi Pi,
The wiki for deleting datasets is out of date, and I will be updating it shortly.
There is a collection of shell scripts included in the scripts/cleanup_datasets directory. In order to delete no longer needed datasets from disk, the scripts can be used in the following order (assuming you have not used library functions):
delete_userless_histories.sh purge_histories.sh purge_datasets.sh
I will send a message after the wiki has been updated.
Ok, thanks for the info. I'll stick to the scripts above for the time being.
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete?
This is an excellent feature for us to add to the script.
That would be very useful. I uploaded a 6 GB file and modified it many times during some debugging. I already ran the python script without - r, so I'm stuck with many copies of this now worthless test data...
2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept-I- know-what-I'm-doing-Delete outdated stuff :) ?
There are several database tables which Galaxy expects to exist (for Job reporting, etc.) and should not have entries deleted. Datasets are an example of this, when a Dataset is purged, the purged flag is set to True, but the entry is kept. Deleting entries from the dataset tables is not recommended.
Ok, I guess I'll have to live with that and I can imagine you might want to keep certain metadata / logs even when the job itself is marked EOL. In general though I prefer user interface elements that do what they claim to do; Hence I expect a delete button/script to delete something, but in this case it might save the day: I'll try to unpurge everything manually and re-run the script with -r.... Thanks, Pi
Thanks for using Galaxy,
Dan
Hi Erick, Greg et alia,
I've setup Galaxy with a MySQL DB too, but I cannot get rid off old stuff. According to the wiki, running the script with ... -1 or -3 or -5 should show me what the script would do with -2, -4 or -6. When I ran with -1 it told me:
-------- # 2009-07-29 14:03:22 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated userless histories have been deleted # Deleted 0 histories.
Elapsed time: 0.21 --------
That was I bit weird, because I know there should be stuff to delete. So I tried my luck with -2 to perform the actual cleanup and viola:
-------- # 2009-07-29 14:04:25 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated deleted histories have been purged 1 4 5 6 7 8 9 10 11 12 13 14
<..cut a lot of white space..>
15 16 # Purged 14 histories.
Elapsed time: 1.17 --------
Running with -3, -4 and -5 all gave me 0 in either purged data sets or folders, but I know there must be stuff associated with user accounts older than 1 day that should be purged... The -6 option does not seem to work at all as I got this error: "cleanup_datasets.py: error: no such option: -6". Do I miss something?
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete? 2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept-I- know-what-I'm-doing-Delete outdated stuff :) ?
Cheers,
Pi
On 23•Jul•2009, at 5:17 PM, Erick Antezana wrote:
Greg,
please see in-line:
2009/7/23 Greg Von Kuster <ghv2@psu.edu> Hi Erick,
Erick Antezana wrote: Greg,
I manage to set my connection string so that we could use a remote mysql server. Thanks.
w.r.t. the datasets purging, I used the scripts to clean deleted libraries, folders, datasets, userless history ... I've seen that one must speficy the span of time in days. What about the data that was added mistakenly for instance today and that we want to immediately delete it? I tried to launch the script with "-d 0" but the data is still there... Am I missing something?
No, I don't think so. It's possible that your system clock is off from your database time.
both servers (mysql and the one where galaxy is running) have the same time.
Is your database storing time as local time?
how can I see that?
The cleanup script uses the update_time for the objects being deleted.
In which file can I find the SQL command that actually deletes and purges the data?
I am no longer using the sqlite DB created in our first trials. I guess I can safely delete (from the command line) all the files under the directory database?
Maybe. Did you keep any data that refers to them in your tables when you migrated to mysql? If so, you'll need to keep them.
no, I have no data referring to anything... I just deleted (to save space) all those files and I have no problems at all (so far ;-) )
have the purge_*.sh scripts tested with mysql?
Yes
last question (already asked before): are there any plans to support Oracle?
Not sure why it wouldn't already be supported, although we don't use it here. Just needs a different URL - sqlalchemy supports Oracle.
good to know that, I will try to find some time to test it and let you know.
cheers, Erick
thanks, Erick
2009/7/22 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu>>
Erick,
To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not.
The config setting is the "database_connection" setting, and could look something like this:
database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock
Greg Von Kuster Galaxy Development Team
Erick Antezana wrote:
Hello,
I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file:
#database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20
Could you point out to some doc or briefly describe what I need to do in order to go for mysql?
Are there any plans to support other DBMS's (like Oracle for instance)?
thanks, Erick
------------------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu>
http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
Hi Dan, I have been facing the same problem as Pieter (as I reported some time ago) while trying to purge several NGS data files with which I was playing ... at that time I had used directly the scripts with different options (-d 1, -d 0, etc) and the deleted files were still there, I have just tried once again executing them in the order you have indicated but unfortunately the files are still there... I am using mysql to store my data. I have the same behaviour while using the default db (sqlite). cheers, Erick 2009/7/29 Daniel Blankenberg <dan@bx.psu.edu>
Hi Pi,
The wiki for deleting datasets is out of date, and I will be updating it shortly.
There is a collection of shell scripts included in the scripts/cleanup_datasets directory. In order to delete no longer needed datasets from disk, the scripts can be used in the following order (assuming you have not used library functions):
delete_userless_histories.sh purge_histories.sh purge_datasets.sh
I will send a message after the wiki has been updated.
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete?
This is an excellent feature for us to add to the script.
2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept-I- know-what-I'm-doing-Delete outdated stuff :) ?
There are several database tables which Galaxy expects to exist (for Job reporting, etc.) and should not have entries deleted. Datasets are an example of this, when a Dataset is purged, the purged flag is set to True, but the entry is kept. Deleting entries from the dataset tables is not recommended.
Thanks for using Galaxy,
Dan
Hi Erick, Greg et alia,
I've setup Galaxy with a MySQL DB too, but I cannot get rid off old stuff. According to the wiki, running the script with ... -1 or -3 or -5 should show me what the script would do with -2, -4 or -6. When I ran with -1 it told me:
-------- # 2009-07-29 14:03:22 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated userless histories have been deleted # Deleted 0 histories.
Elapsed time: 0.21 --------
That was I bit weird, because I know there should be stuff to delete. So I tried my luck with -2 to perform the actual cleanup and viola:
-------- # 2009-07-29 14:04:25 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated deleted histories have been purged 1 4 5 6 7 8 9 10 11 12 13 14
<..cut a lot of white space..>
15 16 # Purged 14 histories.
Elapsed time: 1.17 --------
Running with -3, -4 and -5 all gave me 0 in either purged data sets or folders, but I know there must be stuff associated with user accounts older than 1 day that should be purged... The -6 option does not seem to work at all as I got this error: "cleanup_datasets.py: error: no such option: -6". Do I miss something?
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete? 2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept-I- know-what-I'm-doing-Delete outdated stuff :) ?
Cheers,
Pi
On 23•Jul•2009, at 5:17 PM, Erick Antezana wrote:
Greg,
please see in-line:
2009/7/23 Greg Von Kuster <ghv2@psu.edu> Hi Erick,
Erick Antezana wrote: Greg,
I manage to set my connection string so that we could use a remote mysql server. Thanks.
w.r.t. the datasets purging, I used the scripts to clean deleted libraries, folders, datasets, userless history ... I've seen that one must speficy the span of time in days. What about the data that was added mistakenly for instance today and that we want to immediately delete it? I tried to launch the script with "-d 0" but the data is still there... Am I missing something?
No, I don't think so. It's possible that your system clock is off from your database time.
both servers (mysql and the one where galaxy is running) have the same time.
Is your database storing time as local time?
how can I see that?
The cleanup script uses the update_time for the objects being deleted.
In which file can I find the SQL command that actually deletes and purges the data?
I am no longer using the sqlite DB created in our first trials. I guess I can safely delete (from the command line) all the files under the directory database?
Maybe. Did you keep any data that refers to them in your tables when you migrated to mysql? If so, you'll need to keep them.
no, I have no data referring to anything... I just deleted (to save space) all those files and I have no problems at all (so far ;-) )
have the purge_*.sh scripts tested with mysql?
Yes
last question (already asked before): are there any plans to support Oracle?
Not sure why it wouldn't already be supported, although we don't use it here. Just needs a different URL - sqlalchemy supports Oracle.
good to know that, I will try to find some time to test it and let you know.
cheers, Erick
thanks, Erick
2009/7/22 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu>>
Erick,
To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not.
The config setting is the "database_connection" setting, and could look something like this:
database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock
Greg Von Kuster Galaxy Development Team
Erick Antezana wrote:
Hello,
I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file:
#database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20
Could you point out to some doc or briefly describe what I need to do in order to go for mysql?
Are there any plans to support other DBMS's (like Oracle for instance)?
thanks, Erick
------------------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu>
http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Hi Dan and Erick, Here's a follow up on my attempts to free up some disk space. I deleted all histories of all users (in a test installation :)). All I have left now is a few datasets in libraries. More than 10 days later I ran the scripts. At first this didn't help. Apparently the script does not check whether it is being executed with proper permissions to clean, because in the logs I found numerous entries like this one: # Error, file has already been removed: [Errno 13] Permission denied: 'database/files' At the end of the log it still claims: # Freed disk space: 609064379 But it did not clean a single byte. Re-running the script as root does not help, because the script did modify the database... So I manually modified the database by setting the purged column to 0 and the update_time to something at least 10 days ago. Then re-running the script did free up some disk space :). When I look in the galaxy/database/files/ directory I still see duplicated files, which are easily identified, because they have exactly the same size. These appear to be associated with libraries. As I was experimenting I uploaded and deleted the same files several times and eventually only kept a single copy of these files in my libraries... Maybe if I delete all libraries as well I'll be able to get rid of the binary trash. It appears to me that Galaxy can not clean datasets unless the history or library it was once assigned to is also deleted. Is this correct? If so, this is quite a pain, because it is only natural for users to experiment resulting first in several failures before resulting in the datasets they want to keep. As long as they keep the histories / libraries with the good data all the intermediate failures associated with those histories / libraries will claim disk space too :(... Cheers, Pi On 30•Jul•2009, at 1:33 PM, Erick Antezana wrote:
Hi Dan,
I have been facing the same problem as Pieter (as I reported some time ago) while trying to purge several NGS data files with which I was playing ... at that time I had used directly the scripts with different options (-d 1, -d 0, etc) and the deleted files were still there, I have just tried once again executing them in the order you have indicated but unfortunately the files are still there...
I am using mysql to store my data. I have the same behaviour while using the default db (sqlite).
cheers, Erick
2009/7/29 Daniel Blankenberg <dan@bx.psu.edu> Hi Pi,
The wiki for deleting datasets is out of date, and I will be updating it shortly.
There is a collection of shell scripts included in the scripts/cleanup_datasets directory. In order to delete no longer needed datasets from disk, the scripts can be used in the following order (assuming you have not used library functions):
delete_userless_histories.sh purge_histories.sh purge_datasets.sh
I will send a message after the wiki has been updated.
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete?
This is an excellent feature for us to add to the script.
2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept- I- know-what-I'm-doing-Delete outdated stuff :) ?
There are several database tables which Galaxy expects to exist (for Job reporting, etc.) and should not have entries deleted. Datasets are an example of this, when a Dataset is purged, the purged flag is set to True, but the entry is kept. Deleting entries from the dataset tables is not recommended.
Thanks for using Galaxy,
Dan
Hi Erick, Greg et alia,
I've setup Galaxy with a MySQL DB too, but I cannot get rid off old stuff. According to the wiki, running the script with ... -1 or -3 or -5 should show me what the script would do with -2, -4 or -6. When I ran with -1 it told me:
-------- # 2009-07-29 14:03:22 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated userless histories have been deleted # Deleted 0 histories.
Elapsed time: 0.21 --------
That was I bit weird, because I know there should be stuff to delete. So I tried my luck with -2 to perform the actual cleanup and viola:
-------- # 2009-07-29 14:04:25 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated deleted histories have been purged 1 4 5 6 7 8 9 10 11 12 13 14
<..cut a lot of white space..>
15 16 # Purged 14 histories.
Elapsed time: 1.17 --------
Running with -3, -4 and -5 all gave me 0 in either purged data sets or folders, but I know there must be stuff associated with user accounts older than 1 day that should be purged... The -6 option does not seem to work at all as I got this error: "cleanup_datasets.py: error: no such option: -6". Do I miss something?
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete? 2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept- I- know-what-I'm-doing-Delete outdated stuff :) ?
Cheers,
Pi
On 23•Jul•2009, at 5:17 PM, Erick Antezana wrote:
Greg,
please see in-line:
2009/7/23 Greg Von Kuster <ghv2@psu.edu> Hi Erick,
Erick Antezana wrote: Greg,
I manage to set my connection string so that we could use a remote mysql server. Thanks.
w.r.t. the datasets purging, I used the scripts to clean deleted libraries, folders, datasets, userless history ... I've seen that one must speficy the span of time in days. What about the data that was added mistakenly for instance today and that we want to immediately delete it? I tried to launch the script with "-d 0" but the data is still there... Am I missing something?
No, I don't think so. It's possible that your system clock is off from your database time.
both servers (mysql and the one where galaxy is running) have the same time.
Is your database storing time as local time?
how can I see that?
The cleanup script uses the update_time for the objects being deleted.
In which file can I find the SQL command that actually deletes and purges the data?
I am no longer using the sqlite DB created in our first trials. I guess I can safely delete (from the command line) all the files under the directory database?
Maybe. Did you keep any data that refers to them in your tables when you migrated to mysql? If so, you'll need to keep them.
no, I have no data referring to anything... I just deleted (to save space) all those files and I have no problems at all (so far ;-) )
have the purge_*.sh scripts tested with mysql?
Yes
last question (already asked before): are there any plans to support Oracle?
Not sure why it wouldn't already be supported, although we don't use it here. Just needs a different URL - sqlalchemy supports Oracle.
good to know that, I will try to find some time to test it and let you know.
cheers, Erick
thanks, Erick
2009/7/22 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu>>
Erick,
To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not.
The config setting is the "database_connection" setting, and could look something like this:
database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock
Greg Von Kuster Galaxy Development Team
Erick Antezana wrote:
Hello,
I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file:
#database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20
Could you point out to some doc or briefly describe what I need to do in order to go for mysql?
Are there any plans to support other DBMS's (like Oracle for instance)?
thanks, Erick
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu>
http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
Hi Pi, An updated wiki on this topic is available at http://bitbucket.org/galaxy/galaxy-central/wiki/PurgeHistoriesAndDatasets .
Apparently the script does not check whether it is being executed with proper permissions to clean,
Executing the scripts using the -f flag will cause the script to attempt to re-purge datasets that are already marked as purged (changing the time delay would also be required as the table entries would have been marked as updated at the failed attempt.)
It appears to me that Galaxy can not clean datasets unless the history or library it was once assigned to is also deleted. Is this correct?
Using the -6 flag / delete_datasets.sh script added in changeset 2551:5b405a43c406 will allow a base dataset to be marked as deleted without requiring a history/library/folder to be purged. This script could take considerable time, depending on the number of datasets in Galaxy.
When I look in the galaxy/database/files/ directory I still see duplicated files, which are easily identified, because they have exactly the same size. These appear to be associated with libraries. As I was experimenting I uploaded and deleted the same files several times and eventually only kept a single copy of these files in my libraries..
When history items are shared or library items are imported (to/from a history or between libraries), the base dataset (file) is shared, preventing duplication of file data on disk. Uploading a file several times creates a different base dataset (file) each time; each of the instances associated with each individually uploaded file will need to be deleted before a specific file will be purged from disk. Thanks for using Galaxy, Dan
Hi Dan and Erick,
Here's a follow up on my attempts to free up some disk space. I deleted all histories of all users (in a test installation :)). All I have left now is a few datasets in libraries. More than 10 days later I ran the scripts. At first this didn't help. Apparently the script does not check whether it is being executed with proper permissions to clean, because in the logs I found numerous entries like this one:
# Error, file has already been removed: [Errno 13] Permission denied: 'database/files'
At the end of the log it still claims:
# Freed disk space: 609064379
But it did not clean a single byte. Re-running the script as root does not help, because the script did modify the database... So I manually modified the database by setting the purged column to 0 and the update_time to something at least 10 days ago. Then re-running the script did free up some disk space :).
When I look in the galaxy/database/files/ directory I still see duplicated files, which are easily identified, because they have exactly the same size. These appear to be associated with libraries. As I was experimenting I uploaded and deleted the same files several times and eventually only kept a single copy of these files in my libraries... Maybe if I delete all libraries as well I'll be able to get rid of the binary trash.
It appears to me that Galaxy can not clean datasets unless the history or library it was once assigned to is also deleted. Is this correct? If so, this is quite a pain, because it is only natural for users to experiment resulting first in several failures before resulting in the datasets they want to keep. As long as they keep the histories / libraries with the good data all the intermediate failures associated with those histories / libraries will claim disk space too :(...
Cheers,
Pi
On 30Jul2009, at 1:33 PM, Erick Antezana wrote:
Hi Dan,
I have been facing the same problem as Pieter (as I reported some time ago) while trying to purge several NGS data files with which I was playing ... at that time I had used directly the scripts with different options (-d 1, -d 0, etc) and the deleted files were still there, I have just tried once again executing them in the order you have indicated but unfortunately the files are still there...
I am using mysql to store my data. I have the same behaviour while using the default db (sqlite).
cheers, Erick
2009/7/29 Daniel Blankenberg <dan@bx.psu.edu> Hi Pi,
The wiki for deleting datasets is out of date, and I will be updating it shortly.
There is a collection of shell scripts included in the scripts/cleanup_datasets directory. In order to delete no longer needed datasets from disk, the scripts can be used in the following order (assuming you have not used library functions):
delete_userless_histories.sh purge_histories.sh purge_datasets.sh
I will send a message after the wiki has been updated.
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete?
This is an excellent feature for us to add to the script.
2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept- I- know-what-I'm-doing-Delete outdated stuff :) ?
There are several database tables which Galaxy expects to exist (for Job reporting, etc.) and should not have entries deleted. Datasets are an example of this, when a Dataset is purged, the purged flag is set to True, but the entry is kept. Deleting entries from the dataset tables is not recommended.
Thanks for using Galaxy,
Dan
Hi Erick, Greg et alia,
I've setup Galaxy with a MySQL DB too, but I cannot get rid off old stuff. According to the wiki, running the script with ... -1 or -3 or -5 should show me what the script would do with -2, -4 or -6. When I ran with -1 it told me:
-------- # 2009-07-29 14:03:22 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated userless histories have been deleted # Deleted 0 histories.
Elapsed time: 0.21 --------
That was I bit weird, because I know there should be stuff to delete. So I tried my luck with -2 to perform the actual cleanup and viola:
-------- # 2009-07-29 14:04:25 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated deleted histories have been purged 1 4 5 6 7 8 9 10 11 12 13 14
<..cut a lot of white space..>
15 16 # Purged 14 histories.
Elapsed time: 1.17 --------
Running with -3, -4 and -5 all gave me 0 in either purged data sets or folders, but I know there must be stuff associated with user accounts older than 1 day that should be purged... The -6 option does not seem to work at all as I got this error: "cleanup_datasets.py: error: no such option: -6". Do I miss something?
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete? 2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept- I- know-what-I'm-doing-Delete outdated stuff :) ?
Cheers,
Pi
On 23Jul2009, at 5:17 PM, Erick Antezana wrote:
Greg,
please see in-line:
2009/7/23 Greg Von Kuster <ghv2@psu.edu> Hi Erick,
Erick Antezana wrote: Greg,
I manage to set my connection string so that we could use a remote mysql server. Thanks.
w.r.t. the datasets purging, I used the scripts to clean deleted libraries, folders, datasets, userless history ... I've seen that one must speficy the span of time in days. What about the data that was added mistakenly for instance today and that we want to immediately delete it? I tried to launch the script with "-d 0" but the data is still there... Am I missing something?
No, I don't think so. It's possible that your system clock is off from your database time.
both servers (mysql and the one where galaxy is running) have the same time.
Is your database storing time as local time?
how can I see that?
The cleanup script uses the update_time for the objects being deleted.
In which file can I find the SQL command that actually deletes and purges the data?
I am no longer using the sqlite DB created in our first trials. I guess I can safely delete (from the command line) all the files under the directory database?
Maybe. Did you keep any data that refers to them in your tables when you migrated to mysql? If so, you'll need to keep them.
no, I have no data referring to anything... I just deleted (to save space) all those files and I have no problems at all (so far ;-) )
have the purge_*.sh scripts tested with mysql?
Yes
last question (already asked before): are there any plans to support Oracle?
Not sure why it wouldn't already be supported, although we don't use it here. Just needs a different URL - sqlalchemy supports Oracle.
good to know that, I will try to find some time to test it and let you know.
cheers, Erick
thanks, Erick
2009/7/22 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu>>
Erick,
To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not.
The config setting is the "database_connection" setting, and could look something like this:
database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock
Greg Von Kuster Galaxy Development Team
Erick Antezana wrote:
Hello,
I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file:
#database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20
Could you point out to some doc or briefly describe what I need to do in order to go for mysql?
Are there any plans to support other DBMS's (like Oracle for instance)?
thanks, Erick
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu>
http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Hi Dan, Thanks! I'll update right away :)... Cheers, Pi On 10•Aug•2009, at 6:20 PM, Daniel Blankenberg wrote:
Hi Pi,
An updated wiki on this topic is available at http://bitbucket.org/galaxy/galaxy-central/wiki/PurgeHistoriesAndDatasets .
Apparently the script does not check whether it is being executed with proper permissions to clean,
Executing the scripts using the -f flag will cause the script to attempt to re-purge datasets that are already marked as purged (changing the time delay would also be required as the table entries would have been marked as updated at the failed attempt.)
It appears to me that Galaxy can not clean datasets unless the history or library it was once assigned to is also deleted. Is this correct?
Using the -6 flag / delete_datasets.sh script added in changeset 2551:5b405a43c406 will allow a base dataset to be marked as deleted without requiring a history/library/folder to be purged. This script could take considerable time, depending on the number of datasets in Galaxy.
When I look in the galaxy/database/files/ directory I still see duplicated files, which are easily identified, because they have exactly the same size. These appear to be associated with libraries. As I was experimenting I uploaded and deleted the same files several times and eventually only kept a single copy of these files in my libraries..
When history items are shared or library items are imported (to/from a history or between libraries), the base dataset (file) is shared, preventing duplication of file data on disk. Uploading a file several times creates a different base dataset (file) each time; each of the instances associated with each individually uploaded file will need to be deleted before a specific file will be purged from disk.
Thanks for using Galaxy,
Dan
Hi Dan and Erick,
Here's a follow up on my attempts to free up some disk space. I deleted all histories of all users (in a test installation :)). All I have left now is a few datasets in libraries. More than 10 days later I ran the scripts. At first this didn't help. Apparently the script does not check whether it is being executed with proper permissions to clean, because in the logs I found numerous entries like this one:
# Error, file has already been removed: [Errno 13] Permission denied: 'database/files'
At the end of the log it still claims:
# Freed disk space: 609064379
But it did not clean a single byte. Re-running the script as root does not help, because the script did modify the database... So I manually modified the database by setting the purged column to 0 and the update_time to something at least 10 days ago. Then re-running the script did free up some disk space :).
When I look in the galaxy/database/files/ directory I still see duplicated files, which are easily identified, because they have exactly the same size. These appear to be associated with libraries. As I was experimenting I uploaded and deleted the same files several times and eventually only kept a single copy of these files in my libraries... Maybe if I delete all libraries as well I'll be able to get rid of the binary trash.
It appears to me that Galaxy can not clean datasets unless the history or library it was once assigned to is also deleted. Is this correct? If so, this is quite a pain, because it is only natural for users to experiment resulting first in several failures before resulting in the datasets they want to keep. As long as they keep the histories / libraries with the good data all the intermediate failures associated with those histories / libraries will claim disk space too :(...
Cheers,
Pi
On 30•Jul•2009, at 1:33 PM, Erick Antezana wrote:
Hi Dan,
I have been facing the same problem as Pieter (as I reported some time ago) while trying to purge several NGS data files with which I was playing ... at that time I had used directly the scripts with different options (-d 1, -d 0, etc) and the deleted files were still there, I have just tried once again executing them in the order you have indicated but unfortunately the files are still there...
I am using mysql to store my data. I have the same behaviour while using the default db (sqlite).
cheers, Erick
2009/7/29 Daniel Blankenberg <dan@bx.psu.edu> Hi Pi,
The wiki for deleting datasets is out of date, and I will be updating it shortly.
There is a collection of shell scripts included in the scripts/cleanup_datasets directory. In order to delete no longer needed datasets from disk, the scripts can be used in the following order (assuming you have not used library functions):
delete_userless_histories.sh purge_histories.sh purge_datasets.sh
I will send a message after the wiki has been updated.
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete?
This is an excellent feature for us to add to the script.
2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept- I- know-what-I'm-doing-Delete outdated stuff :) ?
There are several database tables which Galaxy expects to exist (for Job reporting, etc.) and should not have entries deleted. Datasets are an example of this, when a Dataset is purged, the purged flag is set to True, but the entry is kept. Deleting entries from the dataset tables is not recommended.
Thanks for using Galaxy,
Dan
Hi Erick, Greg et alia,
I've setup Galaxy with a MySQL DB too, but I cannot get rid off old stuff. According to the wiki, running the script with ... -1 or -3 or -5 should show me what the script would do with -2, -4 or -6. When I ran with -1 it told me:
-------- # 2009-07-29 14:03:22 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated userless histories have been deleted # Deleted 0 histories.
Elapsed time: 0.21 --------
That was I bit weird, because I know there should be stuff to delete. So I tried my luck with -2 to perform the actual cleanup and viola:
-------- # 2009-07-29 14:04:25 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated deleted histories have been purged 1 4 5 6 7 8 9 10 11 12 13 14
<..cut a lot of white space..>
15 16 # Purged 14 histories.
Elapsed time: 1.17 --------
Running with -3, -4 and -5 all gave me 0 in either purged data sets or folders, but I know there must be stuff associated with user accounts older than 1 day that should be purged... The -6 option does not seem to work at all as I got this error: "cleanup_datasets.py: error: no such option: -6". Do I miss something?
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete? 2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept- I- know-what-I'm-doing-Delete outdated stuff :) ?
Cheers,
Pi
On 23•Jul•2009, at 5:17 PM, Erick Antezana wrote:
Greg,
please see in-line:
2009/7/23 Greg Von Kuster <ghv2@psu.edu> Hi Erick,
Erick Antezana wrote: Greg,
I manage to set my connection string so that we could use a remote mysql server. Thanks.
w.r.t. the datasets purging, I used the scripts to clean deleted libraries, folders, datasets, userless history ... I've seen that one must speficy the span of time in days. What about the data that was added mistakenly for instance today and that we want to immediately delete it? I tried to launch the script with "-d 0" but the data is still there... Am I missing something?
No, I don't think so. It's possible that your system clock is off from your database time.
both servers (mysql and the one where galaxy is running) have the same time.
Is your database storing time as local time?
how can I see that?
The cleanup script uses the update_time for the objects being deleted.
In which file can I find the SQL command that actually deletes and purges the data?
I am no longer using the sqlite DB created in our first trials. I guess I can safely delete (from the command line) all the files under the directory database?
Maybe. Did you keep any data that refers to them in your tables when you migrated to mysql? If so, you'll need to keep them.
no, I have no data referring to anything... I just deleted (to save space) all those files and I have no problems at all (so far ;-) )
have the purge_*.sh scripts tested with mysql?
Yes
last question (already asked before): are there any plans to support Oracle?
Not sure why it wouldn't already be supported, although we don't use it here. Just needs a different URL - sqlalchemy supports Oracle.
good to know that, I will try to find some time to test it and let you know.
cheers, Erick
thanks, Erick
2009/7/22 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu>>
Erick,
To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not.
The config setting is the "database_connection" setting, and could look something like this:
database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock
Greg Von Kuster Galaxy Development Team
Erick Antezana wrote:
Hello,
I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file:
#database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20
Could you point out to some doc or briefly describe what I need to do in order to go for mysql?
Are there any plans to support other DBMS's (like Oracle for instance)?
thanks, Erick
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu>
http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
Hi Galaxy developers, One more request... The cleaning of outdated stuff seems to work fine, but my logs are now filling up. The reason is that the cleanup scripts report the python path on STDERR like this: python path is: /mnt/npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-linux-x86_64-ucs4/Cheetah-1.0-py2.4-linux-x86_64.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-linux-x86_64- ucs4/bx_python-0.5.0_dev_r4bf1f32e6b76-py2.4-linux-x86_64.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-linux-x86_64- ucs4/python_lzo-1.08_static-py2.4-linux-x86_64.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/py2.4-linux-x86_64-ucs4, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ docutils-0.4-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/elementtree-1.2.6_20050316- py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-noplatform/wchartype-0.1-py2.4.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/Paste-1.5.1- py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-noplatform/Routes-1.6.3-py2.4.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ flup-0.5.dev_r2311-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/WebOb-0.8.5-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ simplejson-1.5-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/WebHelpers-0.2-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ PasteDeploy-1.3.1-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/Mako-0.1.10-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ Beaker-0.5-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/MyghtyUtils-0.52-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ Babel-0.9.4-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/SQLAlchemy-0.4.7p1-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform, / mnt/npc01d01/prog/galaxy-2009-08-11-c021935a25ac/lib, /usr/lib64/ python24.zip, /usr/lib64/python2.4, /usr/lib64/python2.4/plat-linux2, / usr/lib64/python2.4/lib-tk, /usr/lib64/python2.4/lib-dynload, /usr/ lib64/python2.4/site-packages, /usr/lib64/python2.4/site-packages/ Numeric, /usr/lib64/python2.4/site-packages/gtk-2.0, /usr/lib/ python2.4/site-packages In addition when I run the scripts with cron I get e-mail, because cron thinks something went wrong due to the output on STDERR. I do appreciate getting e-mail in case something really went wrong, so redirecting STDERR to /dev/null isn't a nice solution, but I also don't need Galaxy to remember me daily of my python path :). I guess it's useful for debugging but can this be switched off by default and hidden behind a verbose/log switch or something similar? Cheers, Pi On 11•Aug•2009, at 10:32 AM, Pieter Neerincx wrote:
Hi Dan,
Thanks! I'll update right away :)...
Cheers,
Pi
On 10•Aug•2009, at 6:20 PM, Daniel Blankenberg wrote:
Hi Pi,
An updated wiki on this topic is available at http://bitbucket.org/galaxy/galaxy-central/wiki/PurgeHistoriesAndDatasets .
Apparently the script does not check whether it is being executed with proper permissions to clean,
Executing the scripts using the -f flag will cause the script to attempt to re-purge datasets that are already marked as purged (changing the time delay would also be required as the table entries would have been marked as updated at the failed attempt.)
It appears to me that Galaxy can not clean datasets unless the history or library it was once assigned to is also deleted. Is this correct?
Using the -6 flag / delete_datasets.sh script added in changeset 2551:5b405a43c406 will allow a base dataset to be marked as deleted without requiring a history/library/folder to be purged. This script could take considerable time, depending on the number of datasets in Galaxy.
When I look in the galaxy/database/files/ directory I still see duplicated files, which are easily identified, because they have exactly the same size. These appear to be associated with libraries. As I was experimenting I uploaded and deleted the same files several times and eventually only kept a single copy of these files in my libraries..
When history items are shared or library items are imported (to/ from a history or between libraries), the base dataset (file) is shared, preventing duplication of file data on disk. Uploading a file several times creates a different base dataset (file) each time; each of the instances associated with each individually uploaded file will need to be deleted before a specific file will be purged from disk.
Thanks for using Galaxy,
Dan
Hi Dan and Erick,
Here's a follow up on my attempts to free up some disk space. I deleted all histories of all users (in a test installation :)). All I have left now is a few datasets in libraries. More than 10 days later I ran the scripts. At first this didn't help. Apparently the script does not check whether it is being executed with proper permissions to clean, because in the logs I found numerous entries like this one:
# Error, file has already been removed: [Errno 13] Permission denied: 'database/files'
At the end of the log it still claims:
# Freed disk space: 609064379
But it did not clean a single byte. Re-running the script as root does not help, because the script did modify the database... So I manually modified the database by setting the purged column to 0 and the update_time to something at least 10 days ago. Then re-running the script did free up some disk space :).
When I look in the galaxy/database/files/ directory I still see duplicated files, which are easily identified, because they have exactly the same size. These appear to be associated with libraries. As I was experimenting I uploaded and deleted the same files several times and eventually only kept a single copy of these files in my libraries... Maybe if I delete all libraries as well I'll be able to get rid of the binary trash.
It appears to me that Galaxy can not clean datasets unless the history or library it was once assigned to is also deleted. Is this correct? If so, this is quite a pain, because it is only natural for users to experiment resulting first in several failures before resulting in the datasets they want to keep. As long as they keep the histories / libraries with the good data all the intermediate failures associated with those histories / libraries will claim disk space too :(...
Cheers,
Pi
On 30•Jul•2009, at 1:33 PM, Erick Antezana wrote:
Hi Dan,
I have been facing the same problem as Pieter (as I reported some time ago) while trying to purge several NGS data files with which I was playing ... at that time I had used directly the scripts with different options (-d 1, -d 0, etc) and the deleted files were still there, I have just tried once again executing them in the order you have indicated but unfortunately the files are still there...
I am using mysql to store my data. I have the same behaviour while using the default db (sqlite).
cheers, Erick
2009/7/29 Daniel Blankenberg <dan@bx.psu.edu> Hi Pi,
The wiki for deleting datasets is out of date, and I will be updating it shortly.
There is a collection of shell scripts included in the scripts/cleanup_datasets directory. In order to delete no longer needed datasets from disk, the scripts can be used in the following order (assuming you have not used library functions):
delete_userless_histories.sh purge_histories.sh purge_datasets.sh
I will send a message after the wiki has been updated.
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete?
This is an excellent feature for us to add to the script.
2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I- accept- I- know-what-I'm-doing-Delete outdated stuff :) ?
There are several database tables which Galaxy expects to exist (for Job reporting, etc.) and should not have entries deleted. Datasets are an example of this, when a Dataset is purged, the purged flag is set to True, but the entry is kept. Deleting entries from the dataset tables is not recommended.
Thanks for using Galaxy,
Dan
Hi Erick, Greg et alia,
I've setup Galaxy with a MySQL DB too, but I cannot get rid off old stuff. According to the wiki, running the script with ... -1 or -3 or -5 should show me what the script would do with -2, -4 or -6. When I ran with -1 it told me:
-------- # 2009-07-29 14:03:22 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated userless histories have been deleted # Deleted 0 histories.
Elapsed time: 0.21 --------
That was I bit weird, because I know there should be stuff to delete. So I tried my luck with -2 to perform the actual cleanup and viola:
-------- # 2009-07-29 14:04:25 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated deleted histories have been purged 1 4 5 6 7 8 9 10 11 12 13 14
<..cut a lot of white space..>
15 16 # Purged 14 histories.
Elapsed time: 1.17 --------
Running with -3, -4 and -5 all gave me 0 in either purged data sets or folders, but I know there must be stuff associated with user accounts older than 1 day that should be purged... The -6 option does not seem to work at all as I got this error: "cleanup_datasets.py: error: no such option: -6". Do I miss something?
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete? 2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I- accept- I- know-what-I'm-doing-Delete outdated stuff :) ?
Cheers,
Pi
On 23•Jul•2009, at 5:17 PM, Erick Antezana wrote:
Greg,
please see in-line:
2009/7/23 Greg Von Kuster <ghv2@psu.edu> Hi Erick,
Erick Antezana wrote: Greg,
I manage to set my connection string so that we could use a remote mysql server. Thanks.
w.r.t. the datasets purging, I used the scripts to clean deleted libraries, folders, datasets, userless history ... I've seen that one must speficy the span of time in days. What about the data that was added mistakenly for instance today and that we want to immediately delete it? I tried to launch the script with "-d 0" but the data is still there... Am I missing something?
No, I don't think so. It's possible that your system clock is off from your database time.
both servers (mysql and the one where galaxy is running) have the same time.
Is your database storing time as local time?
how can I see that?
The cleanup script uses the update_time for the objects being deleted.
In which file can I find the SQL command that actually deletes and purges the data?
I am no longer using the sqlite DB created in our first trials. I guess I can safely delete (from the command line) all the files under the directory database?
Maybe. Did you keep any data that refers to them in your tables when you migrated to mysql? If so, you'll need to keep them.
no, I have no data referring to anything... I just deleted (to save space) all those files and I have no problems at all (so far ;-) )
have the purge_*.sh scripts tested with mysql?
Yes
last question (already asked before): are there any plans to support Oracle?
Not sure why it wouldn't already be supported, although we don't use it here. Just needs a different URL - sqlalchemy supports Oracle.
good to know that, I will try to find some time to test it and let you know.
cheers, Erick
thanks, Erick
2009/7/22 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu>>
Erick,
To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not.
The config setting is the "database_connection" setting, and could look something like this:
database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock
Greg Von Kuster Galaxy Development Team
Erick Antezana wrote:
Hello,
I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file:
#database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20
Could you point out to some doc or briefly describe what I need to do in order to go for mysql?
Are there any plans to support other DBMS's (like Oracle for instance)?
thanks, Erick
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu>
http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Group Faculty of Science Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online -------------------------------------------------------------
Hi Pieter, This output has been removed in the latest commit. It should be available in the regular distribution repository in a few days. --nate Pieter Neerincx wrote:
Hi Galaxy developers,
One more request... The cleaning of outdated stuff seems to work fine, but my logs are now filling up. The reason is that the cleanup scripts report the python path on STDERR like this:
python path is: /mnt/npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-linux-x86_64-ucs4/Cheetah-1.0-py2.4-linux-x86_64.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-linux-x86_64- ucs4/bx_python-0.5.0_dev_r4bf1f32e6b76-py2.4-linux-x86_64.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-linux-x86_64- ucs4/python_lzo-1.08_static-py2.4-linux-x86_64.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/py2.4-linux-x86_64-ucs4, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ docutils-0.4-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/elementtree-1.2.6_20050316- py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-noplatform/wchartype-0.1-py2.4.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/Paste-1.5.1- py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-noplatform/Routes-1.6.3-py2.4.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ flup-0.5.dev_r2311-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/WebOb-0.8.5-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ simplejson-1.5-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/WebHelpers-0.2-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ PasteDeploy-1.3.1-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/Mako-0.1.10-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ Beaker-0.5-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/MyghtyUtils-0.52-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ Babel-0.9.4-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/SQLAlchemy-0.4.7p1-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform, / mnt/npc01d01/prog/galaxy-2009-08-11-c021935a25ac/lib, /usr/lib64/ python24.zip, /usr/lib64/python2.4, /usr/lib64/python2.4/plat-linux2, / usr/lib64/python2.4/lib-tk, /usr/lib64/python2.4/lib-dynload, /usr/ lib64/python2.4/site-packages, /usr/lib64/python2.4/site-packages/ Numeric, /usr/lib64/python2.4/site-packages/gtk-2.0, /usr/lib/ python2.4/site-packages
In addition when I run the scripts with cron I get e-mail, because cron thinks something went wrong due to the output on STDERR. I do appreciate getting e-mail in case something really went wrong, so redirecting STDERR to /dev/null isn't a nice solution, but I also don't need Galaxy to remember me daily of my python path :). I guess it's useful for debugging but can this be switched off by default and hidden behind a verbose/log switch or something similar?
Cheers,
Pi
On 11•Aug•2009, at 10:32 AM, Pieter Neerincx wrote:
Hi Dan,
Thanks! I'll update right away :)...
Cheers,
Pi
On 10•Aug•2009, at 6:20 PM, Daniel Blankenberg wrote:
Hi Pi,
An updated wiki on this topic is available at http://bitbucket.org/galaxy/galaxy-central/wiki/PurgeHistoriesAndDatasets .
Apparently the script does not check whether it is being executed with proper permissions to clean, Executing the scripts using the -f flag will cause the script to attempt to re-purge datasets that are already marked as purged (changing the time delay would also be required as the table entries would have been marked as updated at the failed attempt.)
It appears to me that Galaxy can not clean datasets unless the history or library it was once assigned to is also deleted. Is this correct? Using the -6 flag / delete_datasets.sh script added in changeset 2551:5b405a43c406 will allow a base dataset to be marked as deleted without requiring a history/library/folder to be purged. This script could take considerable time, depending on the number of datasets in Galaxy.
When I look in the galaxy/database/files/ directory I still see duplicated files, which are easily identified, because they have exactly the same size. These appear to be associated with libraries. As I was experimenting I uploaded and deleted the same files several times and eventually only kept a single copy of these files in my libraries.. When history items are shared or library items are imported (to/ from a history or between libraries), the base dataset (file) is shared, preventing duplication of file data on disk. Uploading a file several times creates a different base dataset (file) each time; each of the instances associated with each individually uploaded file will need to be deleted before a specific file will be purged from disk.
Thanks for using Galaxy,
Dan
Hi Dan and Erick,
Here's a follow up on my attempts to free up some disk space. I deleted all histories of all users (in a test installation :)). All I have left now is a few datasets in libraries. More than 10 days later I ran the scripts. At first this didn't help. Apparently the script does not check whether it is being executed with proper permissions to clean, because in the logs I found numerous entries like this one:
# Error, file has already been removed: [Errno 13] Permission denied: 'database/files'
At the end of the log it still claims:
# Freed disk space: 609064379
But it did not clean a single byte. Re-running the script as root does not help, because the script did modify the database... So I manually modified the database by setting the purged column to 0 and the update_time to something at least 10 days ago. Then re-running the script did free up some disk space :).
When I look in the galaxy/database/files/ directory I still see duplicated files, which are easily identified, because they have exactly the same size. These appear to be associated with libraries. As I was experimenting I uploaded and deleted the same files several times and eventually only kept a single copy of these files in my libraries... Maybe if I delete all libraries as well I'll be able to get rid of the binary trash.
It appears to me that Galaxy can not clean datasets unless the history or library it was once assigned to is also deleted. Is this correct? If so, this is quite a pain, because it is only natural for users to experiment resulting first in several failures before resulting in the datasets they want to keep. As long as they keep the histories / libraries with the good data all the intermediate failures associated with those histories / libraries will claim disk space too :(...
Cheers,
Pi
On 30•Jul•2009, at 1:33 PM, Erick Antezana wrote:
Hi Dan,
I have been facing the same problem as Pieter (as I reported some time ago) while trying to purge several NGS data files with which I was playing ... at that time I had used directly the scripts with different options (-d 1, -d 0, etc) and the deleted files were still there, I have just tried once again executing them in the order you have indicated but unfortunately the files are still there...
I am using mysql to store my data. I have the same behaviour while using the default db (sqlite).
cheers, Erick
2009/7/29 Daniel Blankenberg <dan@bx.psu.edu> Hi Pi,
The wiki for deleting datasets is out of date, and I will be updating it shortly.
There is a collection of shell scripts included in the scripts/cleanup_datasets directory. In order to delete no longer needed datasets from disk, the scripts can be used in the following order (assuming you have not used library functions):
delete_userless_histories.sh purge_histories.sh purge_datasets.sh
I will send a message after the wiki has been updated.
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete? This is an excellent feature for us to add to the script.
2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I- accept- I- know-what-I'm-doing-Delete outdated stuff :) ? There are several database tables which Galaxy expects to exist (for Job reporting, etc.) and should not have entries deleted. Datasets are an example of this, when a Dataset is purged, the purged flag is set to True, but the entry is kept. Deleting entries from the dataset tables is not recommended.
Thanks for using Galaxy,
Dan
Hi Erick, Greg et alia,
I've setup Galaxy with a MySQL DB too, but I cannot get rid off old stuff. According to the wiki, running the script with ... -1 or -3 or -5 should show me what the script would do with -2, -4 or -6. When I ran with -1 it told me:
-------- # 2009-07-29 14:03:22 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated userless histories have been deleted # Deleted 0 histories.
Elapsed time: 0.21 --------
That was I bit weird, because I know there should be stuff to delete. So I tried my luck with -2 to perform the actual cleanup and viola:
-------- # 2009-07-29 14:04:25 - Handling stuff older than 1 days
# Datasets will NOT be removed from disk.
# The following datasets and associated deleted histories have been purged 1 4 5 6 7 8 9 10 11 12 13 14
<..cut a lot of white space..>
15 16 # Purged 14 histories.
Elapsed time: 1.17 --------
Running with -3, -4 and -5 all gave me 0 in either purged data sets or folders, but I know there must be stuff associated with user accounts older than 1 day that should be purged... The -6 option does not seem to work at all as I got this error: "cleanup_datasets.py: error: no such option: -6". Do I miss something?
In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete? 2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I- accept- I- know-what-I'm-doing-Delete outdated stuff :) ?
Cheers,
Pi
On 23•Jul•2009, at 5:17 PM, Erick Antezana wrote:
> Greg, > > please see in-line: > > 2009/7/23 Greg Von Kuster <ghv2@psu.edu> > Hi Erick, > > > > Erick Antezana wrote: > Greg, > > I manage to set my connection string so that we could use a > remote mysql server. Thanks. > > w.r.t. the datasets purging, I used the scripts to clean deleted > libraries, folders, datasets, userless history ... I've seen that > one must speficy the span of time in days. What about the data > that > was added mistakenly for instance today and that we want to > immediately delete it? I tried to launch the script with "-d 0" > but > the data is still there... Am I missing something? > > No, I don't think so. It's possible that your system clock is > off > from your database time. > > both servers (mysql and the one where galaxy is running) have the > same time. > > Is your database storing time as local time? > > how can I see that? > > The cleanup script uses the update_time for the objects being deleted. > In which file can I find the SQL command that actually deletes > and > purges the data? > > > > > I am no longer using the sqlite DB created in our first trials. I > guess I can safely delete (from the command line) all the files > under the directory database? > > Maybe. Did you keep any data that refers to them in your tables > when you migrated to mysql? If so, you'll need to keep them. > > no, I have no data referring to anything... I just deleted (to > save > space) all those files and I have no problems at all (so > far ;-) ) > > > > > have the purge_*.sh scripts tested with mysql? > > Yes > > > > last question (already asked before): are there any plans to > support Oracle? > > Not sure why it wouldn't already be supported, although we don't use > it here. Just needs a different URL - sqlalchemy supports > Oracle. > > good to know that, I will try to find some time to test it and > let > you know. > > cheers, > Erick > > > > > thanks, > Erick > > 2009/7/22 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu>> > > > Erick, > > To use a different database than the sqlite that come with the > Galaxy distribution all that is needed is to change the config > setting, prviding the URL that points to your mysql database. See > the mysql documentation for the connection URL, as the URL differs > depending upon whether you database is installed locally or not. > > The config setting is the "database_connection" setting, and could > look something like this: > > database_connection = > mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock > > Greg Von Kuster > Galaxy Development Team > > > Erick Antezana wrote: > > Hello, > > I would like to use MySQL instead of sqlite to store my data. > I coudn't find on the Galaxy web site a HOWTO or some > guidelines to do it. I only found some lines that might need > to be changed/enabled in the universe_wsgi.ini file: > > #database_file = database/universe.sqlite > database_connection = mysql:///galaxy > #database_engine_option_echo = true > #database_engine_option_echo_pool = true > #database_engine_option_pool_size = 10 > #database_engine_option_max_overflow = 20 > > Could you point out to some doc or briefly describe what I > need to do in order to go for mysql? > > Are there any plans to support other DBMS's (like Oracle for > instance)? > > thanks, > Erick > >
> _______________________________________________ > galaxy-user mailing list > galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu> > > http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user > > > > _______________________________________________ > galaxy-user mailing list > galaxy-user@bx.psu.edu > http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user ------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Group Faculty of Science Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online -------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Thanks! Pi On 04 Nov 2009, at 19:15, Nate Coraor wrote:
Hi Pieter,
This output has been removed in the latest commit. It should be available in the regular distribution repository in a few days.
--nate
Pieter Neerincx wrote:
Hi Galaxy developers, One more request... The cleaning of outdated stuff seems to work fine, but my logs are now filling up. The reason is that the cleanup scripts report the python path on STDERR like this: python path is: /mnt/npc01d01/prog/galaxy-2009-08-11-c021935a25ac/ eggs/ py2.4-linux-x86_64-ucs4/Cheetah-1.0-py2.4-linux-x86_64.egg, / mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-linux- x86_64- ucs4/bx_python-0.5.0_dev_r4bf1f32e6b76-py2.4-linux- x86_64.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-linux-x86_64- ucs4/python_lzo-1.08_static-py2.4-linux- x86_64.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-linux-x86_64-ucs4, /mnt/ npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/ docutils-0.4-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/ elementtree-1.2.6_20050316- py2.4.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-noplatform/wchartype-0.1- py2.4.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-noplatform/Paste-1.5.1- py2.4.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-noplatform/Routes-1.6.3- py2.4.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-noplatform/ flup-0.5.dev_r2311-py2.4.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/WebOb-0.8.5- py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-noplatform/ simplejson-1.5-py2.4.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/ WebHelpers-0.2-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/ PasteDeploy-1.3.1-py2.4.egg, / mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4- noplatform/Mako-0.1.10-py2.4.egg, /mnt/ npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ Beaker-0.5- py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/ py2.4-noplatform/MyghtyUtils-0.52-py2.4.egg, /mnt/ npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ Babel-0.9.4- py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/ py2.4-noplatform/SQLAlchemy-0.4.7p1-py2.4.egg, /mnt/ npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform, / mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/lib, /usr/lib64/ python24.zip, /usr/lib64/python2.4, /usr/lib64/python2.4/plat- linux2, / usr/lib64/python2.4/lib-tk, /usr/lib64/python2.4/lib- dynload, /usr/ lib64/python2.4/site-packages, /usr/lib64/python2.4/ site-packages/ Numeric, /usr/lib64/python2.4/site-packages/ gtk-2.0, /usr/lib/ python2.4/site-packages In addition when I run the scripts with cron I get e-mail, because cron thinks something went wrong due to the output on STDERR. I do appreciate getting e-mail in case something really went wrong, so redirecting STDERR to /dev/null isn't a nice solution, but I also don't need Galaxy to remember me daily of my python path :). I guess it's useful for debugging but can this be switched off by default and hidden behind a verbose/log switch or something similar? Cheers, Pi On 11•Aug•2009, at 10:32 AM, Pieter Neerincx wrote:
Hi Dan,
Thanks! I'll update right away :)...
Cheers,
Pi
On 10•Aug•2009, at 6:20 PM, Daniel Blankenberg wrote:
Hi Pi,
An updated wiki on this topic is available at http://bitbucket.org/galaxy/galaxy-central/wiki/PurgeHistoriesAndDatasets .
Apparently the script does not check whether it is being executed with proper permissions to clean, Executing the scripts using the -f flag will cause the script to attempt to re-purge datasets that are already marked as purged (changing the time delay would also be required as the table entries would have been marked as updated at the failed attempt.)
It appears to me that Galaxy can not clean datasets unless the history or library it was once assigned to is also deleted. Is this correct? Using the -6 flag / delete_datasets.sh script added in changeset 2551:5b405a43c406 will allow a base dataset to be marked as deleted without requiring a history/library/folder to be purged. This script could take considerable time, depending on the number of datasets in Galaxy.
When I look in the galaxy/database/files/ directory I still see duplicated files, which are easily identified, because they have exactly the same size. These appear to be associated with libraries. As I was experimenting I uploaded and deleted the same files several times and eventually only kept a single copy of these files in my libraries.. When history items are shared or library items are imported (to/ from a history or between libraries), the base dataset (file) is shared, preventing duplication of file data on disk. Uploading a file several times creates a different base dataset (file) each time; each of the instances associated with each individually uploaded file will need to be deleted before a specific file will be purged from disk.
Thanks for using Galaxy,
Dan
Hi Dan and Erick,
Here's a follow up on my attempts to free up some disk space. I deleted all histories of all users (in a test installation :)). All I have left now is a few datasets in libraries. More than 10 days later I ran the scripts. At first this didn't help. Apparently the script does not check whether it is being executed with proper permissions to clean, because in the logs I found numerous entries like this one:
# Error, file has already been removed: [Errno 13] Permission denied: 'database/files'
At the end of the log it still claims:
# Freed disk space: 609064379
But it did not clean a single byte. Re-running the script as root does not help, because the script did modify the database... So I manually modified the database by setting the purged column to 0 and the update_time to something at least 10 days ago. Then re-running the script did free up some disk space :).
When I look in the galaxy/database/files/ directory I still see duplicated files, which are easily identified, because they have exactly the same size. These appear to be associated with libraries. As I was experimenting I uploaded and deleted the same files several times and eventually only kept a single copy of these files in my libraries... Maybe if I delete all libraries as well I'll be able to get rid of the binary trash.
It appears to me that Galaxy can not clean datasets unless the history or library it was once assigned to is also deleted. Is this correct? If so, this is quite a pain, because it is only natural for users to experiment resulting first in several failures before resulting in the datasets they want to keep. As long as they keep the histories / libraries with the good data all the intermediate failures associated with those histories / libraries will claim disk space too :(...
Cheers,
Pi
On 30•Jul•2009, at 1:33 PM, Erick Antezana wrote:
Hi Dan,
I have been facing the same problem as Pieter (as I reported some time ago) while trying to purge several NGS data files with which I was playing ... at that time I had used directly the scripts with different options (-d 1, -d 0, etc) and the deleted files were still there, I have just tried once again executing them in the order you have indicated but unfortunately the files are still there...
I am using mysql to store my data. I have the same behaviour while using the default db (sqlite).
cheers, Erick
2009/7/29 Daniel Blankenberg <dan@bx.psu.edu> Hi Pi,
The wiki for deleting datasets is out of date, and I will be updating it shortly.
There is a collection of shell scripts included in the scripts/cleanup_datasets directory. In order to delete no longer needed datasets from disk, the scripts can be used in the following order (assuming you have not used library functions):
delete_userless_histories.sh purge_histories.sh purge_datasets.sh
I will send a message after the wiki has been updated.
> In addition: > 1. What if I ran the script without -r and later decide I want > to > delete the associated files anyway to free up some space? How > do I > then know what files to delete? This is an excellent feature for us to add to the script.
> 2. If I understand correctly, I should be able to remove > associated > data sets -r, but even when purging stuff the entries will still > remain in the database... How do I really, really, Yes-Ok-I- > accept- I- > know-what-I'm-doing-Delete outdated stuff :) ? There are several database tables which Galaxy expects to exist (for Job reporting, etc.) and should not have entries deleted. Datasets are an example of this, when a Dataset is purged, the purged flag is set to True, but the entry is kept. Deleting entries from the dataset tables is not recommended.
Thanks for using Galaxy,
Dan
> Hi Erick, Greg et alia, > > I've setup Galaxy with a MySQL DB too, but I cannot get rid > off old > stuff. According to the wiki, running the script with ... -1 > or -3 or > -5 should show me what the script would do with -2, -4 or -6. > When I > ran with -1 it told me: > > -------- > # 2009-07-29 14:03:22 - Handling stuff older than 1 days > > # Datasets will NOT be removed from disk. > > # The following datasets and associated userless histories > have been > deleted > # Deleted 0 histories. > > Elapsed time: 0.21 > -------- > > That was I bit weird, because I know there should be stuff to delete. > So I tried my luck with -2 to perform the actual cleanup and > viola: > > -------- > # 2009-07-29 14:04:25 - Handling stuff older than 1 days > > # Datasets will NOT be removed from disk. > > # The following datasets and associated deleted histories > have been > purged > 1 > 4 > 5 > 6 > 7 > 8 > 9 > 10 > 11 > 12 > 13 > 14 > > <..cut a lot of white space..> > > 15 > 16 > # Purged 14 histories. > > Elapsed time: 1.17 > -------- > > Running with -3, -4 and -5 all gave me 0 in either purged data sets or > folders, but I know there must be stuff associated with user accounts > older than 1 day that should be purged... The -6 option does not seem > to work at all as I got this error: "cleanup_datasets.py: > error: no > such option: -6". Do I miss something? > > In addition: > 1. What if I ran the script without -r and later decide I want > to > delete the associated files anyway to free up some space? How > do I > then know what files to delete? > 2. If I understand correctly, I should be able to remove > associated > data sets -r, but even when purging stuff the entries will still > remain in the database... How do I really, really, Yes-Ok-I- > accept- I- > know-what-I'm-doing-Delete outdated stuff :) ? > > Cheers, > > Pi > > On 23•Jul•2009, at 5:17 PM, Erick Antezana wrote: > >> Greg, >> >> please see in-line: >> >> 2009/7/23 Greg Von Kuster <ghv2@psu.edu> >> Hi Erick, >> >> >> >> Erick Antezana wrote: >> Greg, >> >> I manage to set my connection string so that we could use a >> remote mysql server. Thanks. >> >> w.r.t. the datasets purging, I used the scripts to clean >> deleted >> libraries, folders, datasets, userless history ... I've seen >> that >> one must speficy the span of time in days. What about the >> data that >> was added mistakenly for instance today and that we want to >> immediately delete it? I tried to launch the script with "-d >> 0" but >> the data is still there... Am I missing something? >> >> No, I don't think so. It's possible that your system clock >> is off >> from your database time. >> >> both servers (mysql and the one where galaxy is running) have >> the >> same time. >> >> Is your database storing time as local time? >> >> how can I see that? >> >> The cleanup script uses the update_time for the objects being deleted. >> In which file can I find the SQL command that actually >> deletes and >> purges the data? >> >> >> >> >> I am no longer using the sqlite DB created in our first >> trials. I >> guess I can safely delete (from the command line) all the files >> under the directory database? >> >> Maybe. Did you keep any data that refers to them in your >> tables >> when you migrated to mysql? If so, you'll need to keep them. >> >> no, I have no data referring to anything... I just deleted >> (to save >> space) all those files and I have no problems at all (so >> far ;-) ) >> >> >> >> >> have the purge_*.sh scripts tested with mysql? >> >> Yes >> >> >> >> last question (already asked before): are there any plans to >> support Oracle? >> >> Not sure why it wouldn't already be supported, although we >> don't use >> it here. Just needs a different URL - sqlalchemy supports >> Oracle. >> >> good to know that, I will try to find some time to test it >> and let >> you know. >> >> cheers, >> Erick >> >> >> >> >> thanks, >> Erick >> >> 2009/7/22 Greg Von Kuster <ghv2@psu.edu <mailto:ghv2@psu.edu>> >> >> >> Erick, >> >> To use a different database than the sqlite that come with the >> Galaxy distribution all that is needed is to change the config >> setting, prviding the URL that points to your mysql database. See >> the mysql documentation for the connection URL, as the URL differs >> depending upon whether you database is installed locally or >> not. >> >> The config setting is the "database_connection" setting, and could >> look something like this: >> >> database_connection = >> mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock >> >> Greg Von Kuster >> Galaxy Development Team >> >> >> Erick Antezana wrote: >> >> Hello, >> >> I would like to use MySQL instead of sqlite to store my data. >> I coudn't find on the Galaxy web site a HOWTO or some >> guidelines to do it. I only found some lines that might >> need >> to be changed/enabled in the universe_wsgi.ini file: >> >> #database_file = database/universe.sqlite >> database_connection = mysql:///galaxy >> #database_engine_option_echo = true >> #database_engine_option_echo_pool = true >> #database_engine_option_pool_size = 10 >> #database_engine_option_max_overflow = 20 >> >> Could you point out to some doc or briefly describe what I >> need to do in order to go for mysql? >> >> Are there any plans to support other DBMS's (like Oracle >> for >> instance)? >> >> thanks, >> Erick >> >> ------------------------------------------------------------------------ >> _______________________________________________ >> galaxy-user mailing list >> galaxy-user@bx.psu.edu <mailto:galaxy-user@bx.psu.edu> >> >> http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user >> >> >> >> _______________________________________________ >> galaxy-user mailing list >> galaxy-user@bx.psu.edu >> http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user > ------------------------------------------------------------- > Biomolecular Mass Spectrometry and Proteomics > Utrecht University > > Visiting address: > H.R. Kruyt building room O607 > Padualaan 8 > 3584 CH Utrecht > The Netherlands > > Mail address: > P.O. box 80.082 > 3508 TB Utrecht > The Netherlands > > phone: +31 (0)6-143 66 783 > email: pieter.neerincx@gmail.com > skype: pieter.online > ------------------------------------------------------------ > > _______________________________________________ > galaxy-user mailing list > galaxy-user@bx.psu.edu > http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user >
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Group Faculty of Science Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------- _______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------------------------------------------------------------- mobile: +31 6 143 66 783 e-mail: pieter.neerincx@gmail.com skype: pieter.online -------------------------------------------------------------
participants (5)
-
Daniel Blankenberg
-
Erick Antezana
-
Greg Von Kuster
-
Nate Coraor
-
Pieter Neerincx