Assaf Gordon wrote:
Hello Sergei,
I'm experimenting with the clean-up scripts myself, so perhaps I can offer some information (the galaxy team is welcomed to correct me and/or explain better).
1. If you look at the output of your query, you'll notice that the "purged" field is 0 for all datasets (I assume 0 is "false" in MySQL). This means that the actual files where *not* purged (e.g. physically deleted) - at least not by the "purge_datasets.sh" or "cleanup_datasets.py -3" step. Since you did use "-r" parameter, it means those dataset were not picked-up as possible deletion candidates by this script.
2. (The following I found by reading the source code, it's not really well explained - so if I'm wrong - correct me). The "dataset" table has an "update_time" field, and this field is updated automatically whenever the dataset record changes. This means that when you run the first cleanup script and set the "deleted" flag to true, the update_time is updated to "now". When you run the next clean-up script and ask for anything that is older than 1 day ("-d 1"), it looks for the update_time older then one day - so it will *not* find the dataset that was just marked as "deleted" in the first step (because the update_time is "now"). Only if you run the next clean-up script tomorrow, that dataset will be deleted.
So, for example, running the following in succession: cleanup_datasets.py universe_wsgi.ini -d 1 -6 ( => delete datasets ) cleanup_datasets.py universe_wsgi.ini -d 1 -3 -r ( => purge datasets + delete physical files)
both run with "-d 1" - but by design, files from yesterday (1 day old) will not be physically deleted.
Files that the user deleted yesterday (1 day old) will be marked as "deleted", but their update_time will by "now". Only files that were marked as deleted yesterday will be deleted today (meaning: they are 2 days old).
To really delete files now, use "-d 0" with all the scripts. Since this is quite scary, the "-i" (info only) mode will show what what will be deleted (but that requires a recent version 5770:a5e0a5d3c0a1).
3. The file_size=NULL issue happen when a job fails - on some occasions (I couldn't pinpoint exactly when) galaxy does not pickup the fact the an output file was generated even if the job failed, and so you get "ghost" files which exist on the disk but are NULL in the database. The "discard" means the job was discarded (by the galaxy user?) - not that the dataset was deleted/purged by the clean-up scripts.
Also, datasets created prior to the addition of the total_size column in changeset 5700:70e2b1c95a69 will have this unset - it can be set by running the script: % python ./scripts/set_dataset_sizes.py Also, Sergei, it's possible to allow users to force datsaets to be removed from disk after they "delete" them. See the 'allow_user_dataset_purge' option in universe_wsgi.ini. If set to True, users can select "Show Deleted Datasets" from the History's "Options" menu and then choose datasets to purge. Entire histories can be purged from the history list. --nate
Hope this helps, -gordon
Sergei Ryazansky wrote, On 07/06/2011 12:15 PM:
Hi, thank you for answer. I have tried to use the mentioned scripts but it seems that the order of their using at first time was incorrect.. As a result, the metadata in database tables are modified but the datasets files corresponded to deleted datasets in history remains unremoved. So, the following calling of the scripts in the right order (as indicated in wiki) also didn't delete the unused dataset files. Is there any way to update the metadata in tables according to the real state of files? I think that the order of calling the scripts at first time was the following: cleanup_datasets.py universe_wsgi.ini -d 1 -6 -r cleanup_datasets.py universe_wsgi.ini -d 6 -1 -r cleanup_datasets.py universe_wsgi.ini -d 2 -1 -r cleanup_datasets.py universe_wsgi.ini -d 3 -1 -r cleanup_datasets.py universe_wsgi.ini -d 4 -1 -r cleanup_datasets.py universe_wsgi.ini -d 5 -1 -r cleanup_datasets.py universe_wsgi.ini -d 1 -1 -r cleanup_datasets.py universe_wsgi.ini -d 1 -2 -r cleanup_datasets.py universe_wsgi.ini -d 1 -4 -r cleanup_datasets.py universe_wsgi.ini -d 1 -5 -r cleanup_datasets.py universe_wsgi.ini -d 1 -3 -r cleanup_datasets.py universe_wsgi.ini -d 1 -6 -r
Also there are some strange things (imho) in galaxy.dataset table: there a lot of datasets id having or NULL total size:
mysql> select * from dataset where (id="148" or id="53" or id="86" or id="146" or id="330"); +-----+---------------------+---------------------+-----------+---------+--------+----------+-------------------+-------------------+-----------+------------+ | id | create_time | update_time | state | deleted | purged | purgable | external_filename | _extra_files_path | file_size | total_size | +-----+---------------------+---------------------+-----------+---------+--------+----------+-------------------+-------------------+-----------+------------+ | 53 | 2011-03-29 16:21:58 | 2011-07-06 14:17:49 | error | 1 | 0 | 1 | NULL | NULL | 0 | NULL | | 86 | 2011-03-29 20:35:44 | 2011-07-06 14:17:52 | discarded | 1 | 0 | 1 | NULL | NULL | NULL | NULL | | 146 | 2011-05-26 01:38:14 | 2011-07-06 14:18:00 | error | 1 | 0 | 1 | NULL | NULL | NULL | NULL | | 148 | 2011-05-26 02:20:44 | 2011-07-06 14:18:00 | discarded | 1 | 0 | 1 | NULL | NULL | NULL | NULL | | 330 | 2011-07-05 00:44:44 | 2011-07-05 00:44:44 | NULL | 0 | 0 | 1 | NULL | NULL | NULL | NULL | +-----+---------------------+---------------------+-----------+---------+--------+----------+-------------------+-------------------+-----------+------------+
I don't know how these records looked like before calling of the cleanup scripts, but is it possible that it is because of incorrect order of their calling? Is "discarded" state mean that the corresponded file should be deleted? But in my case all these files are still in database folder. Please, let me know if you need any other of clarification of my questions.
2011/7/6 Hans-Rudolf Hotz <hrh@fmi.ch <mailto:hrh@fmi.ch>>
Hi Sergei
This is a question better asked on 'galaxy-dev@bx.psu.edu <mailto:galaxy-dev@bx.psu.edu>' since you refer to your local Galaxy installation.
In order to remove the data from your file system, you need to run the 'cleanup scripts', as described on this wiki page:
https://bitbucket.org/galaxy/galaxy-central/wiki/Config/PurgeHistoriesAndDat...
Regards, Hans
On 07/06/2011 03:33 PM, Sergei Ryazansky wrote:
-------- Исходное сообщение -------- Тема: deleting datasets from history Дата: Tue, 5 Jul 2011 19:58:45 +0300 От: Sergei Ryazansky <s.ryazansky@gmail.com <mailto:s.ryazansky@gmail.com>> Кому: galaxy-user-request@lists.bx.psu.edu <mailto:galaxy-user-request@lists.bx.psu.edu>
Hello all,
After the deleating datasets from the history panel in our Galaxy mirror the indicator at the top right corner shows the same amount of used space as before deleting. Also, the files corresponded to the datasets remains in the Galaxy database/files/000 directory. It seems, that deleting of datasets from history is only delete the launch to file but not the file itself. How to configure the Galaxy mirror to delete not only records in history panel but also the corresponed files? Thank you in advance!
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org <http://usegalaxy.org>. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: