huge trees in library slow

13 Aug 2009

      This is what my importer of our solexa runs currently creates:

Structure of the tree:
Currently each group (~8 ATM) has a library. Within each library each user has 
a folder, each sequenced sample of this user has a folder and within the 
folder of the sequenced sample each analysis (unique combination of 
image-analysis/basecall/alignment) has its own folder with some of the output 
files each as a dataset.

At the moment I store the paths (not the files) from ~250 solexa samples in 
galaxy.
Some samples are sequenced multiple times, for each run 4-5 files (I will also 
add some diagnostic files).
In total there are now 1200 library datasets in the database (the paths that 
is).

I have also tweaked the permission system a little bit. Each dataset has quite 
a lot of permissions associated with it.

So the tree is not that big, but for galaxy/firefox its big.
For one user it takes 20 seconds to bring up the library folder.
Then a little bit more than one minute to bring up the tree (sometimes
firefox thinks the script is hanging).
But its still in development not production and I don't know how much speedup
a change from sqlite to postgresql will bring.

Of course I guess it will only get worse because its always more data that 
will come in.

I don't know how to profile with python, but check_folder_contents in 
security/__init__.py could be one of the culprits.

best wishes,
ido

Ido M. Tamir

tags

participants (1)