Re: [galaxy-dev] Database cleanup and user management

14 Mar 2011

      On Mar 14, 2011, at 16:09 , Greg Von Kuster wrote:
...
On Mar 14, 2011, at 10:52 AM, Assaf Gordon wrote:
...
Do you really want to delete all the user's records from the  
database ? I think the database size is tiny compared to the actual  
files on disk, keeping all database records forever shouldn't be  
such a problem.
You're right, of course. There would be no problem with keeping the  
user's credentials and other records.
...
...
Regarding files,
It's my understanding (galaxy people, correct me if I'm wrong),  
that once a dataset is marked as "deleted" in  
"history_dataset_association" table,
it's as if the user deleted the dataset by himself.
So if you run a query that sets "deleted=true", the galaxy clean-up  
scripts will take it from there and will eventually delete the  
dataset ("eventually", because there are couple of clean up steps  
and scripts).
For the above, I recommentd the approach discussed in my previous  
email ( uncomment the Delete / Undelete / Purge operation buttons in  
the admin controller.  This will do what you want, and you'll not  
need to execute any sql commands manually, which can be very  
dangerous.
Yes, I believe that's how we can do it. But there should be some kind  
of feedback for a user when this is about to happen... and a way to  
prevent it.
...
...
This is actually something I would like to request as a feature:
"reproducibility" doesn't require all the files, all the time -  
only the first file (let say: a FASTQ file) and the meta-data for  
downstream files (jobs, tools, parameters) are needed.
For the above, determining whether a dataset is shared is currently  
available, but what is your definition of a "published dataset"?
A history item that is used on a Galaxy page, as supplementary  
material to a publication, for example, or a history that is listed on  
the "published histories" of Galaxy.
...
...
It would be great if there was a way for users to see the datasets  
(and the jobs, parameters, etc.) of all their datasets (ever), even  
if I deleted the underlying physical file.
Where are you talking about "seeing the datasets, jobs, parameters,  
etc" whose underlying file has been removed from disk?  Would this  
be in the history, where you can currently see "deleted" datasets,  
but not "purged" datasets?
In an ideal world, we would be able to eventually remove all the  
intermediates and keep only the initial dataset from which all the  
results were generated. For that to work 100%, you would need a  
snapshot of the galaxy installation with all the tools and their  
versions at the time of execution... it sounds very messy to implement  
and test, though.

The dataset without the physical file could get its own color and all  
analysis steps could be re-run transparently if someone really wanted  
to access a specific intermediate file.

Thanks for discussing this with us!

-- Sebastian

Re: [galaxy-dev] Database cleanup and user management

Sebastian J. Schultheiss