Hi @ all, I would like to know if anyone could give me some guidance or hint on how to migrate data 'correctly' if Galaxy already wrote files, worked on the database etc. The situation is that we set up an instance, it was already in use and wrote files e.g. into the directory 'galaxy-dist/database/'. The local space is quite small, but more storage is available via a NFS share. I already wrote a bash script for setting up the connections (and editing the 'universe.wsgi' file accordingly) for the entries 'genome_data_path, 'ftp_upload_dir', 'file_path', 'new_file_path' and 'job_working_directory'. Those should address the most bulky features... The application of that script is fine as long as the Galaxy system has never started before. While applying that on a system that already 'did something', e.g. the files in the 'database/' subdirectory remain there and are not transferred. Anyhow, if Galaxy is started up after the application of the new settings, no error is reported. I wonder how Galaxy now deals with that situation: * Handling data from both sources? (read and/or write?) * Automatic movements of the 'historic' data when touched the next time? * Crashes if those older objects are intended to be read/edited? * Removes them silently from the database? => Is there a procedure/module to used in order to migrate the data? Is it sufficient (and appropriate) just to move all contents of the old folders to the new locations? Did I miss any existing documentation on that issue? The answer(s) may diverge, depending on to which of the five parameters announced above those questions are addressed... Any help appreciated before blowing up our production instance :). Thanks in advance, Best regards, Sebastian -- Sebastian Schaaf, M.Sc. Bioinformatics Faculty Coordinator NGS Infrastructure Chair of Biometry and Bioinformatics Department of Medical Informatics, Biometry and Epidemiology (IBE) University of Munich
Hi Sebastian I am also not aware of any such documentation. It is probably impossible to write it, as each such situation is different. I don't quite understand what your 'script' is doing, but in your situation, I recommend the following: First of all, make sure you have a backup of your PostgreSQL (or MySQL) database. Don't make any changes to 'universe.wsgi', but move the bulky directories to new locations on the NFS share and replace them with symbolic links. This has also the advantage, that you can do it step by step (ie directory by directory). Regards, Hans-Rudolf On 01/13/2014 05:33 PM, Sebastian Schaaf wrote:
Hi @ all,
I would like to know if anyone could give me some guidance or hint on how to migrate data 'correctly' if Galaxy already wrote files, worked on the database etc.
The situation is that we set up an instance, it was already in use and wrote files e.g. into the directory 'galaxy-dist/database/'. The local space is quite small, but more storage is available via a NFS share. I already wrote a bash script for setting up the connections (and editing the 'universe.wsgi' file accordingly) for the entries 'genome_data_path, 'ftp_upload_dir', 'file_path', 'new_file_path' and 'job_working_directory'. Those should address the most bulky features... The application of that script is fine as long as the Galaxy system has never started before. While applying that on a system that already 'did something', e.g. the files in the 'database/' subdirectory remain there and are not transferred. Anyhow, if Galaxy is started up after the application of the new settings, no error is reported.
I wonder how Galaxy now deals with that situation: * Handling data from both sources? (read and/or write?) * Automatic movements of the 'historic' data when touched the next time? * Crashes if those older objects are intended to be read/edited? * Removes them silently from the database?
=> Is there a procedure/module to used in order to migrate the data? Is it sufficient (and appropriate) just to move all contents of the old folders to the new locations? Did I miss any existing documentation on that issue?
The answer(s) may diverge, depending on to which of the five parameters announced above those questions are addressed...
Any help appreciated before blowing up our production instance :).
Thanks in advance, Best regards, Sebastian
Hi Hans-Rudolph, Thanks for that. I think you are right that there is no peace of code for that, but it could have been the case (no intend to trigger something like 'rtm' ;) ). My script basically creates the NFS mount points, mounts the network target directories, shuts down Galaxy if running, edits the universe.wsgi file according to the new directories (here: the mount points, according to those five 'bulky parameters') and starts Galaxy again if previously running. The primary purpose is to setup a completely new system from scratch (the NFS section is just a part of a longer script), but I would like to redirect the bulky directories also in our main instance (which is in use). The symbolic links are an adequate workaround I was already aware of, but anyway I am interested in what other developers may have experienced. The situation should not be too exotic..? Best, Sebastian Hans-Rudolf Hotz schrieb:
Hi Sebastian
I am also not aware of any such documentation. It is probably impossible to write it, as each such situation is different.
I don't quite understand what your 'script' is doing, but in your situation, I recommend the following:
First of all, make sure you have a backup of your PostgreSQL (or MySQL) database.
Don't make any changes to 'universe.wsgi', but move the bulky directories to new locations on the NFS share and replace them with symbolic links. This has also the advantage, that you can do it step by step (ie directory by directory).
Regards, Hans-Rudolf
On 01/13/2014 05:33 PM, Sebastian Schaaf wrote:
Hi @ all,
I would like to know if anyone could give me some guidance or hint on how to migrate data 'correctly' if Galaxy already wrote files, worked on the database etc.
The situation is that we set up an instance, it was already in use and wrote files e.g. into the directory 'galaxy-dist/database/'. The local space is quite small, but more storage is available via a NFS share. I already wrote a bash script for setting up the connections (and editing the 'universe.wsgi' file accordingly) for the entries 'genome_data_path, 'ftp_upload_dir', 'file_path', 'new_file_path' and 'job_working_directory'. Those should address the most bulky features... The application of that script is fine as long as the Galaxy system has never started before. While applying that on a system that already 'did something', e.g. the files in the 'database/' subdirectory remain there and are not transferred. Anyhow, if Galaxy is started up after the application of the new settings, no error is reported.
I wonder how Galaxy now deals with that situation: * Handling data from both sources? (read and/or write?) * Automatic movements of the 'historic' data when touched the next time? * Crashes if those older objects are intended to be read/edited? * Removes them silently from the database?
=> Is there a procedure/module to used in order to migrate the data? Is it sufficient (and appropriate) just to move all contents of the old folders to the new locations? Did I miss any existing documentation on that issue?
The answer(s) may diverge, depending on to which of the five parameters announced above those questions are addressed...
Any help appreciated before blowing up our production instance :).
Thanks in advance, Best regards, Sebastian
-- Sebastian Schaaf, M.Sc. Bioinformatics Faculty Coordinator NGS Infrastructure Chair of Biometry and Bioinformatics Department of Medical Informatics, Biometry and Epidemiology (IBE) University of Munich Marchioninistr. 15, K U1 (postal) Marchioninistr. 17, U 006 (office) D-81377 Munich (Germany) Tel: +49 89 2180-78178
participants (2)
-
Hans-Rudolf Hotz
-
Sebastian Schaaf