Sebastian and Assaf, On Mar 14, 2011, at 10:52 AM, Assaf Gordon wrote:
Sebastian J. Schultheiss wrote, On 03/14/2011 10:31 AM:
Is there a way to completely delete a user and their data at the same time? We would like to keep published pages and shared data though.
Do you really want to delete all the user's records from the database ? I think the database size is tiny compared to the actual files on disk, keeping all database records forever shouldn't be such a problem.
Regarding files, It's my understanding (galaxy people, correct me if I'm wrong), that once a dataset is marked as "deleted" in "history_dataset_association" table, it's as if the user deleted the dataset by himself.
So if you run a query that sets "deleted=true", the galaxy clean-up scripts will take it from there and will eventually delete the dataset ("eventually", because there are couple of clean up steps and scripts).
Something like: update history_dataset_association set deleted=true where id = NNNNNN ;
(After finding the ID with previous queries). Same caveat as before: this is very messy, don't do it without testing and verification.
For the above, I recommentd the approach discussed in my previous email ( uncomment the Delete / Undelete / Purge operation buttons in the admin controller. This will do what you want, and you'll not need to execute any sql commands manually, which can be very dangerous.
I know it sort of counters the dogma of having everything reproducible, but our users upload really big datasets and we cannot keep adding new disk space every month, therefore we would like an expiration date on anything that is not published or shared and used. I guess we can modify the SQL queries to include some of the dates stored.
I agree, same problem for us.
This is actually something I would like to request as a feature: "reproducibility" doesn't require all the files, all the time - only the first file (let say: a FASTQ file) and the meta-data for downstream files (jobs, tools, parameters) are needed.
For the above, determining whether a dataset is shared is currently available, but what is your definition of a "published dataset"?
It would be great if there was a way for users to see the datasets (and the jobs, parameters, etc.) of all their datasets (ever), even if I deleted the underlying physical file.
Where are you talking about "seeing the datasets, jobs, parameters, etc" whose underlying file has been removed from disk? Would this be in the history, where you can currently see "deleted" datasets, but not "purged" datasets?
Unfortunately, the current method is that once a dataset is marked as "purged" (meaning the file was deleted), it will never appear again inside galaxy.
-gordon
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Greg Von Kuster Galaxy Development Team greg@bx.psu.edu