My PhD work may be of interest for this subject, although the primary focus has been on generating databases comprising the changes from a specific timeframe, and was not designed specifically for Galaxy. The similarities between my system and the system you are proposing are that it can generate a BLAST database from any date (that has been added to the system), as well as "diffs" between two dates, and supports FASTA, the Uniprot EMBL variant, full files (which does not give compression benefits) and several others. The system uses delta compression to make sure that non-updated fields do not take up extra space. It uses the Hadoop stack (HBase, HDFS and MapReduce) for parallelism in generating the databases (the blast database generation from FASTA files is not parallel).


You can find one of the publications here: http://bdps.cs.uit.no/papers/hibb13.pdf


I hope this can be of some use to you.

Regards,

Edvard Pedersen