I do use that script, though I hope it’s not causing this. The intent behind it was to simulate deletion from the HDA by the user, and then let the cleanup process progress as normal. Also, I only run on certain datasets, and many (all?) of the specific examples of this issue I’ve seen were not ones touched by the script, but datasets copied from a Data Library.
Looking a bit further:
d.deleted
and not hda.purged
= 66845d.deleted
and d.purged
and not hda.purged
= 54818d.deleted
and not d.purged
and not hda.purged
= 12027December 2, 2016 at 1:09 PMLance,Do you use the administrative deletion script you wrote? That would probably cause d.deleted and not hda.purged.d.deleted isn't for queries, it's sort of a deletion buffer. After all hda w/ hda.dataset_id = d.id are hda.purged, the next cleanup step is to mark d.deleted. Then after d.deleted, it's marked d.purged and the file(s) are actually removed from disk. It's a bit of a failsafe so that datasets can be intercepted if unintentionally deleted.--nateDecember 2, 2016 at 12:39 PMThanks Nate, that makes sense. However it seems I still have an issue:
```
select count(d.id)
from dataset d
join history_dataset_association hda on d.id = hda.dataset_id
where d.deleted = 't' and hda.purged = 'f';
count
-------
67464
(1 row)
```
Perhaps I'll need to write some script to check each of these to see if the data does, indeed, exist, and then set the flag appropriately... Hrmm.
Does anyone know what the `dataset.deleted` flag is used for? Is that just supposed to be set when all `hda.purged` are `t`. Sort of like a shortcut for queries?
- Lance
December 2, 2016 at 11:15 AMLance,usegalaxy.org has 4,652,912 such datasets. The cause here is that deleting an entire history does not mark the HDAs deleted (so that if you view a deleted history you can see what datasets were deleted and which were not at the time of deletion). There is a separate hda.purged column that indicates that an HDA is no longer user-recoverable by the user. I have 699 datasets that are d.deleted but not hda.purged, this number should be 0.--nateNovember 30, 2016 at 2:20 PMI've run into issues over the past year where some jobs would occasionally fail to start (stuck in a `new` state). I tracked them down to a situataion where `dataset.deleted` is set to `t` yet the `history_dataset_association.deleted` is `f`. Simply setting `dataset.deleted` to `f` in those instances resolved the issue and the jobs ran. The datasets have all still been on disk.
Since this is a pretty annoying situation, I thought I'd check to see if there are other datasets with this problem. Shockingly, I found many thousands of such datasets:
```
select count(d.id)
from dataset d
join history_dataset_association hda on d.id = hda.dataset_id
where d.deleted = 't' and hda.deleted = 'f';
count
-------
76977
(1 row)
```
I'm hesitant to update so many rows in my database so I thought I'd put this out there for comment. What do others see when running the above query? Has anyone run into this or a similar issue? Thanks.