details: http://www.bx.psu.edu/hg/galaxy/rev/d31ab50dc8e0 changeset: 2812:d31ab50dc8e0 user: Nate Coraor <nate@bx.psu.edu> date: Thu Oct 01 12:35:41 2009 -0400 description: Add a new option, 'allow_library_path_paste' that adds a new upload page ("Upload files from filesystem paths") to the admin-side library upload pages. This form contains a textarea that allows Galaxy admins to paste any number of filesystem paths (files or directories) from which Galaxy will import library datasets, saving the directory structure (if desired). Since such ability allows admins access to any file on the Galaxy server which is readable by Galaxy's system user, this option is disabled by default, and system administrators should take care in assigning Galaxy administrators when this feature is enabled. Controls on what files are accessible to this tool based on ownership or other properties can be added at a later date if there is sufficient interest for such features. This commit also includes a checkbox on the "Upload directory of files" page (as well as the new "Upload files from filesystem paths" page above) that will prevent Galaxy from copying data to its files directory (by default, 'database/files/'). This is useful for large library datasets that live in their own managed locations on the filesystem, this will prevent the existence of duplicate copies of datasets (but means administrators must take care to manage data - moving or removing the data from its Galaxy-external location will render these datasets invalid within Galaxy). One unique feature to be aware of: when using the "Copy data into Galaxy?" checkbox on the "Upload directory of files" page, any symbolic links encountered in the chosen import directory will be made absolute and dereferenced ONCE. This allows administrators to link large datasets to the import directory, rather than having to make full copies, while being able to delete such links after importing. Only the first symlink (the one in the import directory itself) is dereferenced; all others remain. See the following for an example: library_import_dir = /galaxy/import % ls -lR /galaxy/import /galaxy/import: total 6 drwxr-xr-x 2 nate nate 512 Oct 1 11:31 link/ /galaxy/import/link: total 10 lrwxrwxrwx 1 nate nate 71 Oct 1 10:38 1.bed -> ../../../home/nate/galaxy/test-data/1.bed lrwxrwxrwx 1 nate nate 60 Oct 1 10:38 2.bed -> /home/nate/galaxy/test-data/2.bed lrwxrwxrwx 1 nate nate 11 Oct 1 10:38 3.bed -> ../../3.bed lrwxrwxrwx 1 nate nate 35 Oct 1 11:30 4.bed -> ../../galaxy_symlink/test-data/4.bed lrwxrwxrwx 1 nate nate 41 Oct 1 11:31 5.bed -> /galaxy/galaxy_symlink/test-data/5.bed % ls -l /galaxy/3.bed lrwxrwxrwx 1 nate nate 60 Oct 1 10:39 /galaxy/3.bed -> /home/nate/galaxy/test-data/3.bed % ls -l /galaxy/galaxy_symlink lrwxrwxrwx 1 nate nate 44 Oct 1 11:30 /galaxy/galaxy_symlink -> /home/nate/galaxy/ In this example, 1.bed is a relative symbolic link to the real 1.bed. 2.bed is an absolute symlink to the real 2.bed. 3.bed is a relative symlink to ../../3.bed, aka /galaxy/3.bed, which itself is a symlink to the real 3.bed. 4.bed is a relative symlink which follows another symlink (/galaxy/galaxy_symlink) to the real 4.bed. 5.bed is an absolute symlink in the same fashion as 4.bed If the 'link' server directory is chosen on the "Upload directory of files" page, and "Copy data into Galaxy?" is checked "No", the following files will be referenced by Galaxy: /home/nate/galaxy/test-data/1.bed /home/nate/galaxy/test-data/2.bed /galaxy/3.bed /galaxy/galaxy_symlink/test-data/4.bed /galaxy/galaxy_symlink/test-data/5.bed The Galaxy administrator may now safely delete /galaxy/import/link, but should take care not to remove the referenced symbolic links (/galaxy/3.bed, /galaxy/galaxy_symlink). Not all symbolic links are dereferenced because it is assumed that if an administrator links to a path in the import directory which itself is (or contains) links, that is the preferred path for accessing the data. 10 file(s) affected in this change: lib/galaxy/config.py lib/galaxy/tools/actions/upload_common.py lib/galaxy/util/__init__.py lib/galaxy/web/controllers/library_common.py templates/admin/library/browse_library.mako templates/admin/library/upload.mako templates/library/browse_library.mako templates/library/library_dataset_common.mako tools/data_source/upload.py universe_wsgi.ini.sample diffs (377 lines): diff -r 2364764e3604 -r d31ab50dc8e0 lib/galaxy/config.py --- a/lib/galaxy/config.py Thu Oct 01 11:39:55 2009 -0400 +++ b/lib/galaxy/config.py Thu Oct 01 12:35:41 2009 -0400 @@ -87,6 +87,7 @@ self.user_library_import_dir = kwargs.get( 'user_library_import_dir', None ) if self.user_library_import_dir is not None and not os.path.exists( self.user_library_import_dir ): raise ConfigurationError( "user_library_import_dir specified in config (%s) does not exist" % self.user_library_import_dir ) + self.allow_library_path_paste = kwargs.get( 'allow_library_path_paste', False ) # Configuration options for taking advantage of nginx features self.nginx_x_accel_redirect_base = kwargs.get( 'nginx_x_accel_redirect_base', False ) self.nginx_upload_store = kwargs.get( 'nginx_upload_store', False ) diff -r 2364764e3604 -r d31ab50dc8e0 lib/galaxy/tools/actions/upload_common.py --- a/lib/galaxy/tools/actions/upload_common.py Thu Oct 01 11:39:55 2009 -0400 +++ b/lib/galaxy/tools/actions/upload_common.py Thu Oct 01 12:35:41 2009 -0400 @@ -3,6 +3,7 @@ from galaxy import datatypes, util from galaxy.datatypes import sniff from galaxy.util.json import to_json_string +from galaxy.model.orm import eagerload_all import logging log = logging.getLogger( __name__ ) @@ -127,12 +128,29 @@ or trans.user.email in trans.app.config.get( "admin_users", "" ).split( "," ) ): # This doesn't have to be pretty - the only time this should happen is if someone's being malicious. raise Exception( "User is not authorized to add datasets to this library." ) + folder = library_bunch.folder + if uploaded_dataset.get( 'in_folder', False ): + # Create subfolders if desired + for name in uploaded_dataset.in_folder.split( os.path.sep ): + folder.refresh() + matches = filter( lambda x: x.name == name, active_folders( trans, folder ) ) + if matches: + log.debug( 'DEBUGDEBUG: In %s, found a folder name match: %s:%s' % ( folder.name, matches[0].id, matches[0].name ) ) + folder = matches[0] + else: + new_folder = trans.app.model.LibraryFolder( name=name, description='Automatically created by upload tool' ) + new_folder.genome_build = util.dbnames.default_value + folder.add_folder( new_folder ) + new_folder.flush() + trans.app.security_agent.copy_library_permissions( folder, new_folder ) + log.debug( 'DEBUGDEBUG: In %s, created a new folder: %s:%s' % ( folder.name, new_folder.id, new_folder.name ) ) + folder = new_folder if library_bunch.replace_dataset: ld = library_bunch.replace_dataset else: - ld = trans.app.model.LibraryDataset( folder=library_bunch.folder, name=uploaded_dataset.name ) + ld = trans.app.model.LibraryDataset( folder=folder, name=uploaded_dataset.name ) ld.flush() - trans.app.security_agent.copy_library_permissions( library_bunch.folder, ld ) + trans.app.security_agent.copy_library_permissions( folder, ld ) ldda = trans.app.model.LibraryDatasetDatasetAssociation( name = uploaded_dataset.name, extension = uploaded_dataset.file_type, dbkey = uploaded_dataset.dbkey, @@ -153,8 +171,8 @@ else: # Copy the current user's DefaultUserPermissions to the new LibraryDatasetDatasetAssociation.dataset trans.app.security_agent.set_all_dataset_permissions( ldda.dataset, trans.app.security_agent.user_get_default_permissions( trans.user ) ) - library_bunch.folder.add_library_dataset( ld, genome_build=uploaded_dataset.dbkey ) - library_bunch.folder.flush() + folder.add_library_dataset( ld, genome_build=uploaded_dataset.dbkey ) + folder.flush() ld.library_dataset_dataset_association_id = ldda.id ld.flush() # Handle template included in the upload form, if any @@ -230,6 +248,10 @@ is_binary = uploaded_dataset.datatype.is_binary except: is_binary = None + try: + link_data_only = uploaded_dataset.link_data_only + except: + link_data_only = False json = dict( file_type = uploaded_dataset.file_type, ext = uploaded_dataset.ext, name = uploaded_dataset.name, @@ -237,6 +259,7 @@ dbkey = uploaded_dataset.dbkey, type = uploaded_dataset.type, is_binary = is_binary, + link_data_only = link_data_only, space_to_tab = uploaded_dataset.space_to_tab, path = uploaded_dataset.path ) json_file.write( to_json_string( json ) + '\n' ) @@ -276,3 +299,13 @@ trans.app.job_queue.put( job.id, tool ) trans.log_event( "Added job to the job queue, id: %s" % str(job.id), tool_id=job.tool_id ) return dict( [ ( 'output%i' % i, v ) for i, v in enumerate( data_list ) ] ) + +def active_folders( trans, folder ): + # Stolen from galaxy.web.controllers.library_common (importing from which causes a circular issues). + # Much faster way of retrieving all active sub-folders within a given folder than the + # performance of the mapper. This query also eagerloads the permissions on each folder. + return trans.sa_session.query( trans.app.model.LibraryFolder ) \ + .filter_by( parent=folder, deleted=False ) \ + .options( eagerload_all( "actions" ) ) \ + .order_by( trans.app.model.LibraryFolder.table.c.name ) \ + .all() diff -r 2364764e3604 -r d31ab50dc8e0 lib/galaxy/util/__init__.py --- a/lib/galaxy/util/__init__.py Thu Oct 01 11:39:55 2009 -0400 +++ b/lib/galaxy/util/__init__.py Thu Oct 01 12:35:41 2009 -0400 @@ -178,7 +178,7 @@ # better solution I think is to more responsibility for # sanitizing into the tool parameters themselves so that # different parameters can be sanitized in different ways. - NEVER_SANITIZE = ['file_data', 'url_paste', 'URL'] + NEVER_SANITIZE = ['file_data', 'url_paste', 'URL', 'filesystem_paths'] def __init__( self, params, safe=True, sanitize=True, tool=None ): if safe: diff -r 2364764e3604 -r d31ab50dc8e0 lib/galaxy/web/controllers/library_common.py --- a/lib/galaxy/web/controllers/library_common.py Thu Oct 01 11:39:55 2009 -0400 +++ b/lib/galaxy/web/controllers/library_common.py Thu Oct 01 12:35:41 2009 -0400 @@ -81,7 +81,9 @@ tool_params = upload_common.persist_uploads( tool_params ) uploaded_datasets = upload_common.get_uploaded_datasets( trans, tool_params, precreated_datasets, dataset_upload_inputs, library_bunch=library_bunch ) elif upload_option == 'upload_directory': - uploaded_datasets = self.get_server_dir_uploaded_datasets( trans, params, full_dir, import_dir_desc, library_bunch, err_redirect, msg ) + uploaded_datasets, err_redirect, msg = self.get_server_dir_uploaded_datasets( trans, params, full_dir, import_dir_desc, library_bunch, err_redirect, msg ) + elif upload_option == 'upload_paths': + uploaded_datasets, err_redirect, msg = self.get_path_paste_uploaded_datasets( trans, params, library_bunch, err_redirect, msg ) upload_common.cleanup_unused_precreated_datasets( precreated_datasets ) if upload_option == 'upload_file' and not uploaded_datasets: msg = 'Select a file, enter a URL or enter text' @@ -98,37 +100,86 @@ json_file_path = upload_common.create_paramfile( uploaded_datasets ) data_list = [ ud.data for ud in uploaded_datasets ] return upload_common.create_job( trans, tool_params, tool, json_file_path, data_list, folder=library_bunch.folder ) + def make_library_uploaded_dataset( self, trans, params, name, path, type, library_bunch, in_folder=None ): + library_bunch.replace_dataset = None # not valid for these types of upload + uploaded_dataset = util.bunch.Bunch() + uploaded_dataset.name = name + uploaded_dataset.path = path + uploaded_dataset.type = type + uploaded_dataset.ext = None + uploaded_dataset.file_type = params.file_type + uploaded_dataset.dbkey = params.dbkey + uploaded_dataset.space_to_tab = params.space_to_tab + if in_folder: + uploaded_dataset.in_folder = in_folder + uploaded_dataset.data = upload_common.new_upload( trans, uploaded_dataset, library_bunch ) + if params.get( 'link_data_only', False ): + uploaded_dataset.link_data_only = True + uploaded_dataset.data.file_name = os.path.abspath( path ) + uploaded_dataset.data.flush() + return uploaded_dataset def get_server_dir_uploaded_datasets( self, trans, params, full_dir, import_dir_desc, library_bunch, err_redirect, msg ): files = [] try: for entry in os.listdir( full_dir ): # Only import regular files - if os.path.isfile( os.path.join( full_dir, entry ) ): - files.append( entry ) + path = os.path.join( full_dir, entry ) + if os.path.islink( path ) and os.path.isfile( path ) and params.get( 'link_data_only', False ): + # If we're linking instead of copying, link the file the link points to, not the link itself. + link_path = os.readlink( path ) + if os.path.isabs( link_path ): + path = link_path + else: + path = os.path.abspath( os.path.join( os.path.dirname( path ), link_path ) ) + if os.path.isfile( path ): + files.append( path ) except Exception, e: msg = "Unable to get file list for configured %s, error: %s" % ( import_dir_desc, str( e ) ) err_redirect = True - return None + return None, err_redirect, msg if not files: msg = "The directory '%s' contains no valid files" % full_dir err_redirect = True - return None + return None, err_redirect, msg uploaded_datasets = [] for file in files: - library_bunch.replace_dataset = None - uploaded_dataset = util.bunch.Bunch() - uploaded_dataset.path = os.path.join( full_dir, file ) - if not os.path.isfile( uploaded_dataset.path ): + name = os.path.basename( file ) + uploaded_datasets.append( self.make_library_uploaded_dataset( trans, params, name, file, 'server_dir', library_bunch ) ) + return uploaded_datasets, None, None + def get_path_paste_uploaded_datasets( self, trans, params, library_bunch, err_redirect, msg ): + if params.get( 'filesystem_paths', '' ) == '': + msg = "No paths entered in the upload form" + err_redirect = True + return None, err_redirect, msg + preserve_dirs = True + if params.get( 'dont_preserve_dirs', False ): + preserve_dirs = False + # locate files + bad_paths = [] + uploaded_datasets = [] + for line in [ l.strip() for l in params.filesystem_paths.splitlines() if l.strip() ]: + path = os.path.abspath( line ) + if not os.path.exists( path ): + bad_paths.append( path ) continue - uploaded_dataset.type = 'server_dir' - uploaded_dataset.name = file - uploaded_dataset.ext = None - uploaded_dataset.file_type = params.file_type - uploaded_dataset.dbkey = params.dbkey - uploaded_dataset.space_to_tab = params.space_to_tab - uploaded_dataset.data = upload_common.new_upload( trans, uploaded_dataset, library_bunch ) - uploaded_datasets.append( uploaded_dataset ) - return uploaded_datasets + # don't bother processing if we're just going to return an error + if not bad_paths: + if os.path.isfile( path ): + name = os.path.basename( path ) + uploaded_datasets.append( self.make_library_uploaded_dataset( trans, params, name, path, 'path_paste', library_bunch ) ) + for basedir, dirs, files in os.walk( line ): + for file in files: + file_path = os.path.abspath( os.path.join( basedir, file ) ) + if preserve_dirs: + in_folder = os.path.dirname( file_path.replace( path, '', 1 ).lstrip( '/' ) ) + else: + in_folder = None + uploaded_datasets.append( self.make_library_uploaded_dataset( trans, params, file, file_path, 'path_paste', library_bunch, in_folder ) ) + if bad_paths: + msg = "Invalid paths:<br><ul><li>%s</li></ul>" % "</li><li>".join( bad_paths ) + err_redirect = True + return None, err_redirect, msg + return uploaded_datasets, None, None @web.expose def info_template( self, trans, cntrller, library_id, response_action='library', obj_id=None, folder_id=None, ldda_id=None, **kwd ): # Only adding a new templAte to a library or folder is currently allowed. Editing an existing template is diff -r 2364764e3604 -r d31ab50dc8e0 templates/admin/library/browse_library.mako --- a/templates/admin/library/browse_library.mako Thu Oct 01 11:39:55 2009 -0400 +++ b/templates/admin/library/browse_library.mako Thu Oct 01 12:35:41 2009 -0400 @@ -73,7 +73,7 @@ // Make ajax call $.ajax( { type: "POST", - url: "${h.url_for( controller='library_dataset', action='library_item_updates' )}", + url: "${h.url_for( controller='library_common', action='library_item_updates' )}", dataType: "json", data: { ids: ids.join( "," ), states: states.join( "," ) }, success : function ( data ) { diff -r 2364764e3604 -r d31ab50dc8e0 templates/admin/library/upload.mako --- a/templates/admin/library/upload.mako Thu Oct 01 11:39:55 2009 -0400 +++ b/templates/admin/library/upload.mako Thu Oct 01 12:35:41 2009 -0400 @@ -18,6 +18,9 @@ %if trans.app.config.library_import_dir and os.path.exists( trans.app.config.library_import_dir ): <a class="action-button" href="${h.url_for( controller='library_admin', action='upload_library_dataset', library_id=library_id, folder_id=folder_id, replace_id=replace_id, upload_option='upload_directory' )}">Upload directory of files</a> %endif + %if trans.app.config.allow_library_path_paste: + <a class="action-button" href="${h.url_for( controller='library_admin', action='upload_library_dataset', library_id=library_id, folder_id=folder_id, replace_id=replace_id, upload_option='upload_paths' )}">Upload files from filesystem paths</a> + %endif <a class="action-button" href="${h.url_for( controller='library_admin', action='upload_library_dataset', library_id=library_id, folder_id=folder_id, replace_id=replace_id, upload_option='import_from_history' )}">Import datasets from your current history</a> </div> <br/><br/> diff -r 2364764e3604 -r d31ab50dc8e0 templates/library/browse_library.mako --- a/templates/library/browse_library.mako Thu Oct 01 11:39:55 2009 -0400 +++ b/templates/library/browse_library.mako Thu Oct 01 12:35:41 2009 -0400 @@ -105,7 +105,7 @@ // Make ajax call $.ajax( { type: "POST", - url: "${h.url_for( controller='library_dataset', action='library_item_updates' )}", + url: "${h.url_for( controller='library_common', action='library_item_updates' )}", dataType: "json", data: { ids: ids.join( "," ), states: states.join( "," ) }, success : function ( data ) { diff -r 2364764e3604 -r d31ab50dc8e0 templates/library/library_dataset_common.mako --- a/templates/library/library_dataset_common.mako Thu Oct 01 11:39:55 2009 -0400 +++ b/templates/library/library_dataset_common.mako Thu Oct 01 12:35:41 2009 -0400 @@ -1,11 +1,13 @@ <%def name="render_upload_form( controller, upload_option, action, library_id, folder_id, replace_dataset, file_formats, dbkeys, roles, history )"> <% import os, os.path %> - %if upload_option in [ 'upload_file', 'upload_directory' ]: + %if upload_option in [ 'upload_file', 'upload_directory', 'upload_paths' ]: <div class="toolForm" id="upload_library_dataset"> - %if upload_option == 'upload_file': + %if upload_option == 'upload_directory': + <div class="toolFormTitle">Upload a directory of files</div> + %elif upload_option == 'upload_paths': + <div class="toolFormTitle">Upload files from filesystem paths</div> + %else: <div class="toolFormTitle">Upload files</div> - %else: - <div class="toolFormTitle">Upload a directory of files</div> %endif <div class="toolFormBody"> <form name="upload_library_dataset" action="${action}" enctype="multipart/form-data" method="post"> @@ -103,6 +105,44 @@ %endif </div> <div style="clear: both"></div> + </div> + %elif upload_option == 'upload_paths': + <div class="form-row"> + <label>Paths to upload</label> + <div class="form-row-input"> + <textarea name="filesystem_paths" rows="10" cols="35"></textarea> + </div> + <div class="toolParamHelp" style="clear: both;"> + Upload all files pasted in the box. The (recursive) contents of any pasted directories will be added as well. + </div> + </div> + <div class="form-row"> + <label>Preserve directory structure?</label> + <div class="form-row-input"> + <input type="checkbox" name="dont_preserve_dirs" value="No"/>No + </div> + <div class="toolParamHelp" style="clear: both;"> + If checked, all files in subdirectories on the filesystem will be placed at the top level of the folder, instead of into subfolders. + </div> + </div> + %endif + %if upload_option in ( 'upload_directory', 'upload_paths' ): + <div class="form-row"> + <label>Copy data into Galaxy?</label> + <div class="form-row-input"> + <input type="checkbox" name="link_data_only" value="No"/>No + </div> + <div class="toolParamHelp" style="clear: both;"> + Normally data uploaded with this tool is copied into Galaxy's "files" directory + so any later changes to the data will not affect Galaxy. However, this may not + be desired (especially for large NGS datasets), so use of this option will + force Galaxy to always read the data from its original path. + %if upload_option == 'upload_directory': + Any symlinks encountered in the upload directory will be dereferenced once - + that is, Galaxy will point directly to the file that is linked, but no other + symlinks further down the line will be dereferenced. + %endif + </div> </div> %endif <div class="form-row"> diff -r 2364764e3604 -r d31ab50dc8e0 tools/data_source/upload.py --- a/tools/data_source/upload.py Thu Oct 01 11:39:55 2009 -0400 +++ b/tools/data_source/upload.py Thu Oct 01 12:35:41 2009 -0400 @@ -238,7 +238,9 @@ if ext == 'auto': ext = 'data' # Move the dataset to its "real" path - if dataset.type == 'server_dir': + if dataset.get( 'link_data_only', False ): + pass # data will remain in place + elif dataset.type in ( 'server_dir', 'path_paste' ): shutil.copy( dataset.path, output_path ) else: shutil.move( dataset.path, output_path ) diff -r 2364764e3604 -r d31ab50dc8e0 universe_wsgi.ini.sample --- a/universe_wsgi.ini.sample Thu Oct 01 11:39:55 2009 -0400 +++ b/universe_wsgi.ini.sample Thu Oct 01 12:35:41 2009 -0400 @@ -60,13 +60,22 @@ # Galaxy session security id_secret = changethisinproductiontoo -# Directories of files contained in the following directory can be uploaded to a library from the Admin view +# Directories of files contained in the following directory can be uploaded to +# a library from the Admin view #library_import_dir = /var/opt/galaxy/import -# The following can be configured to allow non-admin users to upload a directory of files. The -# configured directory must contain sub-directories named the same as the non-admin user's Galaxy -# login ( email ). The non-admin user is restricted to uploading files or sub-directories of files -# contained in their directory. -# user_library_import_dir = /var/opt/galaxy/import/users + +# The following can be configured to allow non-admin users to upload a +# directory of files. The configured directory must contain sub-directories +# named the same as the non-admin user's Galaxy login ( email ). The non-admin +# user is restricted to uploading files or sub-directories of files contained +# in their directory. +#user_library_import_dir = /var/opt/galaxy/import/users + +# The admin library upload tool may contain a box allowing admins to paste +# filesystem paths to files and directories to add to a library. Set to True +# to enable. Please note the security implication that this will give Galaxy +# Admins access to anything your Galaxy user has access to. +#allow_library_path_paste = False # path to sendmail sendmail_path = /usr/sbin/sendmail