That's right I have over-exaggerated: the size of the complete database dump is currently 125 Mo.
Yet it does not really change my question about the galaxy database "philosophy". With time and instance "upscaling", 1To is not so unbelievable, don't you think ? On the other hand, what is the point to keep in the database deleted histories, users, datasets, etc ? Is it just that the db structure is so complicated that real deletion would be too risky ? (indeed I am not a guru of postgresql at all).
Apart from this lack of "esthetics" in the adopted solution to keep everything until the end of times  (just my humble opinion), there is other aspects a bit irritating: for instance, when you manage users, as already discussed in previous posts, you get confused by many users who do not exist anymore since a long time. Just an example.

Sometime, when I try to imagine the future of our galaxy instance in the next 5 years let's say, I got the feeling that the only solution would be to restart a galaxy instance from scratch, asking users to register again, reimport their datasets etc... which again goes against my sense of esthetics.

I would be curious to know what are the plans for the future of https://main.g2.bx.psu.edu/ for instance.

Chris



Christophe Antoniewski


Drosophila Genetics and Epigenetics
Laboratoire de Biologie du Développement
9, Quai St Bernard, Boîte courrier 24
75252 Paris Cedex 05

Tel +33 1 44 27 34 39
Fax +33 1 44 27 34 45
Mobile +33 6 68 60 51 50

http://drosophile.org



2013/8/29 Dannon Baker <dannon.baker@gmail.com>
Can you get a dump of table sizes for us to compare with?

http://wiki.postgresql.org/wiki/Disk_Usage


On Thu, Aug 29, 2013 at 12:05 PM, Nate Coraor <nate@bx.psu.edu> wrote:
On Aug 29, 2013, at 11:50 AM, Nate Coraor wrote:

> On Aug 26, 2013, at 5:03 AM, Christophe Antoniewski wrote:
>
>> Hi everybody,
>>
>> The python scripts to clean histories, datasets, users etc.. are fine...
>> However, the records are not really removed from the postgresql database and as a result, this one gets bigger and bigger with unused records. Ours is above 1 To after 2 years of production.
>>
>> Is there a safe way to clean the database from unused records and their dependencies to reduce it size, without being a postgresql guru ?
>
> Hi Chris,
>
> The database maintains a permanent record of everything that was done, even though the underlying data can be removed.  There are a lot of dependencies between objects in Galaxy and removing records, especially anything with a foreign key, could easily result in a lot of problems with all kinds of things, from the UI to running jobs.  Because of this, records cannot be removed from the database.

Somehow I missed that you said your database is 1 TB - that should not be the case.  Unless your database is not being vacuumed or you create objects at an extreme rate, it seems as though something has been stored in it that should not have.

--nate

>
> --nate
>
>>
>> Chris
>> --
>> Christophe Antoniewski
>>
>>
>>
>> Drosophila Genetics and Epigenetics
>> Laboratoire de Biolologie du Développement
>> 9, Quai St Bernard, Boîte courrier 24
>> 75252 Paris Cedex 05
>>
>> Tel   +33 1 44 27 34 39
>> Fax   +33 1 44 27 34 45
>> Mobile       +33 6 68 60 51 50
>>
>> http://drosophile.org
>>
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>> http://lists.bx.psu.edu/
>>
>> To search Galaxy mailing lists use the unified search at:
>> http://galaxyproject.org/search/mailinglists/
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/
>


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/