December 2010 - galaxy-dev - lists.galaxyproject.org

UCSC genome browser
by Rene Dreos (JIC) 08 Dec '10

08 Dec '10

Dear Developer my institute would like to run a local version of Galaxy and I managed to properly install it on my mac. We are focused on plant and especially on Arabidopsis. I managed to upgrade our local Galaxy with Arabidopsis genome and I am able to run some analysis (ChIP-seq) from mapping to peak detection with MACS. The next step would be to visualize the results on UCSC genome browser. We have a local version of it (with the latest Arabidopsis genome assembly) but MACS result hasn't got any link to UCSC GB but only to GeneTrack (that is not working). How can I make Galaxy aware of our local GB? thank you very much kind regard r

2 1

slow execution of set_metadata.py
by Matthias Gierth 08 Dec '10

08 Dec '10

Hello List, I have a small problem with my own local Galaxy instance. So I try to set up some workflows for NGS. Everything is working fine for now, but the process for setting the Metadata on a file takes a lot of time. Currently I created a workflow --> fastq file from Library --> fastx_groomer( convert from Illumina to sanger format)-->mapping with bwa So the grooming and mapping runs fine, but after mapping the set_metadata.py takes longer than the mapping of the 2gb fastq-file. The testing server is a Dell R710 with 2x6-core cpus and 72GB of memory Below there is my Config for Galaxy. Anybody an Idea whats going wrong with my setup? many thanks Matthias # # Galaxy is configured by default to be useable in a single-user development # environment. To tune the application for a multi-user production # environment, see the documentation at: # # http://bitbucket.org/galaxy/galaxy-central/wiki/Config/ProductionServer # # Throughout this sample configuration file, except where stated otherwise, # uncommented values override the default if left unset, whereas commented # values are set to the default value. # examples of many of these options are explained in more detail in the wiki: # # Config hackers are encouraged to check there before asking for help. # ---- HTTP Server ---------------------------------------------------------- # Configuration of the internal HTTP server. [server:main] # The internal HTTP server to use. Currently only Paste is provided. This # option is required. use = egg:Paste#http # The port on which to listen. port = 8081 # The address on which to listen. By default, only listen to localhost (Galaxy # will not be accessible over the network). Use '0.0.0.0' to listen on all # available network interfaces. host = 0.0.0.0 # Use a threadpool for the web server instead of creating a thread for each # request. use_threadpool = True # Number of threads in the web server thread pool. threadpool_workers = 8 # ---- Filters -------------------------------------------------------------- # Filters sit between Galaxy and the HTTP server. # These filters are disabled by default. They can be enabled with # 'filter-with' in the [app:main] section below. # Define the gzip filter. [filter:gzip] use = egg:Paste#gzip # Define the proxy-prefix filter. [filter:proxy-prefix] use = egg:PasteDeploy#prefix prefix = /galaxy # ---- Galaxy --------------------------------------------------------------- # Configuration of the Galaxy application. [app:main] # -- Application and filtering # The factory for the WSGI application. This should not be changed. paste.app_factory = galaxy.web.buildapp:app_factory # If not running behind a proxy server, you may want to enable gzip compression # to decrease the size of data transferred over the network. If using a proxy # server, please enable gzip compression there instead. #filter-with = gzip # If running behind a proxy server and Galaxy is served from a subdirectory, # enable the proxy-prefix filter and set the prefix in the # [filter:proxy-prefix] section above. #filter-with = proxy-prefix # If proxy-prefix is enabled and you're running more than one Galaxy instance # behind one hostname, you will want to set this to the same path as the prefix # in the filter above. This value becomes the "path" attribute set in the # cookie so the cookies from each instance will not clobber each other. #cookie_path = None # -- Database # By default, Galaxy uses a SQLite database at 'database/universe.sqlite'. You # may use a SQLAlchemy connection string to specify an external database # instead. This string takes many options which are explained in detail in the # config file documentation. database_connection = mysql://xxx:xxx@localhost/galaxy?unix_socket=/data/mysql/mysql.sock # If the server logs errors about not having enough database pool connections, # you will want to increase these values, or consider running more Galaxy # processes. #database_engine_option_pool_size = 5 #database_engine_option_max_overflow = 10 # If using MySQL and the server logs the error "MySQL server has gone away", # you will want to set this to some positive value (7200 should work). database_engine_option_pool_recycle = 7200 # If large database query results are causing memory or response time issues in # the Galaxy process, leave the result on the server instead. This option is # only available for PostgreSQL and is highly recommended. #database_engine_option_server_side_cursors = False # Create only one connection to the database per thread, to reduce the # connection overhead. Recommended when not using SQLite: #database_engine_option_strategy = threadlocal # -- Files and directories # Dataset files are stored in this directory. file_path = /galaxytemp/data # Temporary files are stored in this directory. new_file_path = /galaxytemp/tmp # Tool config file, defines what tools are available in Galaxy. #tool_config_file = tool_conf.xml # Path to the directory containing the tools defined in the config. #tool_path = tools # Directory where data used by tools is located, see the samples in that # directory and the wiki for help: # http://bitbucket.org/galaxy/galaxy-central/wiki/DataIntegration #tool_data_path = tool-data # Datatypes config file, defines what data (file) types are available in # Galaxy. #datatypes_config_file = datatypes_conf.xml # -- Mail and notification # Galaxy sends mail for various things: Subscribing users to the mailing list # if they request it, emailing password resets, notification from the Galaxy # Sample Tracking system, and reporting dataset errors. To do this, it needs # to send mail through an SMTP server, which you may define here. #smtp_server = None # On the user registration form, users may choose to join the mailing list. # This is the address of the list they'll be subscribed to. #mailing_join_addr = galaxy-user-join(a)bx.psu.edu # Datasets in an error state include a link to report the error. Those reports # will be sent to this address. Error reports are disabled if no address is set. #error_email_to = None # -- Display sites # Galaxy can display data at various external browsers. These options specify # which browsers should be available. URLs and builds available at these # browsers are defined in the specifield files. # UCSC browsers: tool-data/shared/ucsc/ucsc_build_sites.txt #ucsc_display_sites = main,test,archaea,ucla # GBrowse servers: tool-data/shared/gbrowse/gbrowse_build_sites.txt #gbrowse_display_sites = wormbase,tair,modencode_worm,modencode_fly # GeneTrack servers: tool-data/shared/genetrack/genetrack_sites.txt #genetrack_display_sites = main,test # -- UI Localization # Append "/{brand}" to the "Galaxy" text in the masthead. #brand = None # The URL linked by the "Galaxy/brand" text. #logo_url = / # The URL linked by the "Galaxy Wiki" link in the "Help" menu. #wiki_url = http://bitbucket.org/galaxy/galaxy-central/wiki # The URL linked by the "Email comments..." link in the "Help" menu. #bugs_email = None # The URL linked by the "How to Cite..." link in the "Help" menu. #citation_url = http://bitbucket.org/galaxy/galaxy-central/wiki/Citations # Serve static content, which must be enabled if you're not serving it via a # proxy server. These options should be self explanatory and so are not # documented individually. You can use these paths (or ones in the proxy # server) to point to your own styles. static_enabled = True static_cache_time = 360 static_dir = %(here)s/static/ static_images_dir = %(here)s/static/images static_favicon_dir = %(here)s/static/favicon.ico static_scripts_dir = %(here)s/static/scripts/ static_style_dir = %(here)s/static/june_2007_style/blue # -- Logging and Debugging # Verbosity of console log messages. Acceptable values can be found here: # http://docs.python.org/library/logging.html#logging-levels #log_level = DEBUG # Print database operations to the server log (warning, quite verbose!). #database_engine_option_echo = False # Print database pool operations to the server log (warning, quite verbose!). #database_engine_option_echo_pool = False # Turn on logging of application events and some user events to the database. #log_events = True # Turn on logging of user actions to the database. Actions currently logged are # grid views, tool searches, and use of "recently" used tools menu. The # log_events and log_actions functionality will eventually be merged. #log_actions = True # Debug enables access to various config options useful for development and # debugging: use_lint, use_profile, use_printdebug and use_interactive. It # also causes the files used by PBS/SGE (submission script, output, and error) # to remain on disk after the job is complete. Debug mode is disabled if # commented, but is uncommented by default in the sample config. debug = True # Check for WSGI compliance. #use_lint = False # Run the Python profiler on each request. #use_profile = False # Intercept print statements and show them on the returned page. #use_printdebug = True # Enable live debugging in your browser. This should NEVER be enabled on a # public site. Enabled in the sample config for development. use_interactive = True # Write thread status periodically to 'heartbeat.log', (careful, uses disk # space rapidly!). Useful to determine why your processes may be consuming a # lot of CPU. #use_heartbeat = False # Enable the memory debugging interface (careful, negatively impacts server # performance). #use_memdump = False # -- Data Libraries # These library upload options are described in much more detail in the wiki: # http://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/UploadingFiles # Add an option to the library upload form which allows administrators to # upload a directory of files. library_import_dir = /data/ # Add an option to the library upload form which allows authorized # non-administrators to upload a directory of files. The configured directory # must contain sub-directories named the same as the non-admin user's Galaxy # login ( email ). The non-admin user is restricted to uploading files or # sub-directories of files contained in their directory. #user_library_import_dir = None # Add an option to the admin library upload tool allowing admins to paste # filesystem paths to files and directories in a box, and these paths will be # added to a library. Set to True to enable. Please note the security # implication that this will give Galaxy Admins access to anything your Galaxy # user has access to. allow_library_path_paste = True # -- Users and Security # Galaxy encodes various internal values when these values will be output in # some format (for example, in a URL or cookie). You should set a key to be # used by the algorithm that encodes and decodes these values. It can be any # string. If left unchanged, anyone could construct a cookie that would grant # them access to others' sessions. #id_secret = USING THE DEFAULT IS NOT SECURE! # User authentication can be delegated to an upstream proxy server (usually # Apache). The upstream proxy should set a REMOTE_USER header in the request. # Enabling remote user disables regular logins. For more information, see: # http://bitbucket.org/galaxy/galaxy-central/wiki/Config/ApacheProxy #use_remote_user = False # If use_remote_user is enabled and your external authentication # method just returns bare usernames, set a default mail domain to be appended # to usernames, to become your Galaxy usernames (email addresses). #remote_user_maildomain = None # If use_remote_user is enabled, you can set this to a URL that will log your # users out. #remote_user_logout_href = None # Administrative users - set this to a comma-separated list of valid Galaxy # users (email addresses). These users will have access to the Admin section # of the server, and will have access to create users, groups, roles, # libraries, and more. For more information, see: # http://bitbucket.org/galaxy/galaxy-central/wiki/Admin/AdminInterface admin_users = xxxx@biotec.tu-dresden.de,xxxx@biotec.tu-dresden.de # Force everyone to log in (disable anonymous access). #require_login = False # Allow unregistered users to create new accounts (otherwise, they will have to # be created by an admin). #allow_user_creation = True # Allow administrators to delete accounts. #allow_user_deletion = False # By default, users' data will be public, but setting this to True will cause # it to be private. Does not affect existing users and data, only ones created # after this option is set. Users may still change their default back to # public. new_user_dataset_access_role_default_private = True # -- Beta features # Enable Galaxy's built-in visualization module, Trackster. #enable_tracks = False # Enable Galaxy Pages. Pages are custom webpages that include embedded Galaxy items, # such as datasets, histories, workflows, and visualizations; pages are useful for # documenting and sharing multiple analyses or workflows. Pages are created using a # WYSIWYG editor that is very similar to a word processor. #enable_pages = False # Enable the (experimental! beta!) Web API. Documentation forthcoming. #enable_api = False # -- Job Execution # If running multiple Galaxy processes, one can be designated as the job # runner. For more information, see: # http://bitbucket.org/galaxy/galaxy-central/wiki/Config/WebApplicationScaling enable_job_running = True # Should jobs be tracked through the database, rather than in memory. # Necessary if you're running the load balanced setup. track_jobs_in_database = True # Enable job recovery (if Galaxy is restarted while cluster jobs are running, # it can "recover" them when it starts). This is not safe to use if you are # running more than one Galaxy server using the same database. enable_job_recovery = True # Setting metadata on job outputs to in a separate process (or if using a # cluster, on the cluster). Thanks to Python's Global Interpreter Lock and the # hefty expense that setting metadata incurs, your Galaxy process may become # unresponsive when this operation occurs internally. set_metadata_externally = True # Although it is fairly reliable, setting metadata can occasionally fail. In # these instances, you can choose to retry setting it internally or leave it in # a failed state (since retrying internally may cause the Galaxy process to be # unresponsive). If this option is set to False, the user will be given the # option to retry externally, or set metadata manually (when possible). #retry_metadata_internally = True # Number of concurrent jobs to run (local job runner) local_job_queue_workers = 7 # Jobs can be killed after a certain amount of execution time. Format is in # hh:mm:ss. Currently only implemented for PBS. #job_walltime = None # Clustering Galaxy is not a straightforward process and requires some # pre-configuration. See the the wiki before attempting to set any of these # options: # http://bitbucket.org/galaxy/galaxy-central/wiki/Config/Cluster # Comma-separated list of job runners to start. local is always started. If # left commented, no jobs will be run on the cluster, even if a cluster URL is # explicitly defined in the [galaxy:tool_runners] section below. The runners # currently available are 'pbs' and 'drmaa'. #start_job_runners = None # The URL for the default runner to use when a tool doesn't explicity define a # runner below. #default_cluster_job_runner = local:/// # The cluster runners have their own thread pools used to prepare and finish # jobs (so that these sometimes lengthy operations do not block normal queue # operation). The value here is the number of worker threads available to each # started runner. #cluster_job_queue_workers = 3 # These options are only used when using file staging with PBS. #pbs_application_server = #pbs_stage_path = #pbs_dataset_server = # ---- Tool Job Runners ----------------------------------------------------- # Individual per-tool job runner overrides. If not listed here, a tool will # run with the runner defined with default_cluster_job_runner. [galaxy:tool_runners] biomart = local:/// encode_db1 = local:/// hbvar = local:/// microbial_import1 = local:/// ucsc_table_direct1 = local:/// ucsc_table_direct_archaea1 = local:/// ucsc_table_direct_test1 = local:/// upload1 = local:/// # ---- Galaxy Message Queue ------------------------------------------------- # Galaxy uses AMQ protocol to receive messages from external sources like # bar code scanners. Galaxy has been tested against RabbitMQ AMQP implementation. # For Galaxy to receive messages from a message queue the RabbitMQ server has # to be set up with a user account and other parameters listed below. The 'host' # and 'port' fields should point to where the RabbitMQ server is running. [galaxy_amqp] #host = 127.0.0.1 #port = 5672 #userid = galaxy #password = galaxy #virtual_host = galaxy_messaging_engine #queue = galaxy_queue #exchange = galaxy_exchange #routing_key = bar_code_scanner -- Matthias Gierth System Administrator Applied Bioinformatics Group (ABG) TU Dresden, Biotec Tatzberg 47-51 01307 Dresden Phone: +49 (0)351 463 40020 Fax: +49 (0)351 463 40087 Email: matthias.gierth(a)biotec.tu-dresden.de

2 1

uploading data from UCSC on scaled Apache configuration broken
by David Hoover 07 Dec '10

07 Dec '10

I have followed the directions for scaling at http://bitbucket.org/galaxy/galaxy-central/wiki/Config/WebApplicationScaling, trying both the single job-runner/single web-server and single job-runner/multiple web-server approaches. In both cases, uploading data from UCSC failed with the following error message: An error occurred running this job: The remote data source application may be off line, please try again later. Error: [Errno socket error] [Errno -3] Temporary failure in name resolution What does this mean? We have multiple web servers behind a load balancing switch. If I limit the number of web servers to one, then the data upload works. Must the remote data request return to the same server and port as it went out from? David Hoover Helix Systems Staff http://helix.nih.gov

1 1

Error setting-up and running bowtie tasks
by Chris Cole 06 Dec '10

06 Dec '10

I have previously been able to run bowtie jobs successfully, but now I get this error: galaxy.jobs.runners.drmaa ERROR 2010-12-06 15:09:44,068 failure running job 2248 Traceback (most recent call last): File "/homes/www-galaxy/galaxy_devel/lib/galaxy/jobs/runners/drmaa.py", line 124, in queue_job job_wrapper.prepare() File "/homes/www-galaxy/galaxy_devel/lib/galaxy/jobs/__init__.py", line 368, in prepare self.command_line = self.tool.build_command_line( param_dict ) File "/homes/www-galaxy/galaxy_devel/lib/galaxy/tools/__init__.py", line 1524, in build_command_line command_line = fill_template( self.command, context=param_dict ) File "/homes/www-galaxy/galaxy_devel/lib/galaxy/util/template.py", line 9, in fill_template return str( Template( source=template_text, searchList=[context] ) ) File "/homes/www-galaxy/galaxy_devel/eggs/Cheetah-2.2.2-py2.6-linux-x86_64-ucs2.egg/Cheetah/Template.py", line 1004, in __str__ return getattr(self, mainMethName)() File "DynamicallyCompiledCheetahTemplate.py", line 182, in respond IndexError: list index out of range I've a feeling it has something to with the bowtie_indices.loc file and it's associated entry in tool_data_table_conf.xml. I can see that the format for the .loc file has changed (it now has four columns instead of two): h_sapiens37 NCBI37 Homo sapiens ncbi37 /db/bowtie/h_sapiens_37_asm But the xml doesn't match (it only has two columns):  <table name="bowtie_indexes"> <columns>name, value</columns> <file path="tool-data/bowtie_indices.loc" /> </table> What should there be in the 'columns' section? Thanks, Chris

3 4

Changing the FASTA to tabular converter
by Peter 06 Dec '10

06 Dec '10

Hello all, I ran into a problem in a work flow manipulating FASTA and tabular files. I traced this to an unexpected behaviour of the FASTA to tabular converter. In my experience most command line tools which take FASTA files as input treat the first word after the ">" as the identifier for each FASTA record, and any subsequent text as an optional description. It could then make sense to turn a FASTA file into a three column tabular file (identifier, description, sequence). Currently Galaxy does not make this distinction, so we have just two columns (identifier+description, seq). Would you all be amenable to my extending this script to allow the user to choose between 2 column output (current behaviour) and 3 column output (splitting the FASTA ">" line at the first white space)? Alternatively, I have written a less invasive patch to allow an easy way to extract the identifier (first word) and sequence: http://bitbucket.org/peterjc/galaxy-central/changeset/f57552b4f9fb Note that currently the converter does allow the ">" line to be trimmed which can achieve the same goal but ONLY when all the identifiers are the same length (rarely the case in my experience). Similarly, I'd like to extend the tabular to FASTA converter to allow a third column to be selected as the description, giving for example ">c1 c3" as the ">" line, with the sequence coming from c2. I look forward to comments, Thanks, Peter P.S. All these comments apply equally to the FASTQ to/from tabular converters.

2 9

tool with no inputs?
by Kostas Karasavvas 06 Dec '10

06 Dec '10

Hi all, Is that possible? As long as there is no <inputs> tag or if there is an empty inputs tag I get an error. When I provide an input it works. Is there a way around it? I need it to access a server that could take zero arguments. Cheers, Kostas

2 4

2011 GMOD Spring Training, March 8-12
by Dave Clements, GMOD Help Desk 03 Dec '10

03 Dec '10

Applications are now being accepted for the 2011 GMOD Spring Training course a five-day hands-on school aimed at teaching new GMOD administrators how to install, configure and integrate popular GMOD components. The course will be held March 8-12 at the US National Evolutionary Synthesis Center (NESCent) in Durham, North Carolina, as part of GMOD Americas 2011. Links: * http://gmod.org/wiki/2011_GMOD_Spring_Training * http://gmod.org/wiki/GMOD_Americas_2011 * http://www.nescent.org/ These components will be covered: * Apollo - genome annotation editor * Chado - biological database schema * Galaxy - workflow system * GBrowse - genome viewer * GBrowse_syn - synteny viewer * GFF3 - genome annotation file format and tools * InterMine - biological data mining system * JBrowse - next generation genome browser * MAKER - genome annotation pipeline * Tripal - web front end to Chado databases The deadline for applying is the end of Friday, January 7, 2011. Admission is competitive and is based on the strength of the application, especially the statement of interest. The 2010 school had over 60 applicants for the 25 slots. Any application received after deadline will be automatically placed on the waiting list. The course requires some knowledge of Linux as a prerequisite. The registration fee will be $265 (only $53 per day!). There will be a limited number of scholarships available. This may be the only GMOD School offered in 2011. If you are interested, you are strongly encouraged to apply by January 7. Thanks, Dave Clements -- http://gmod.org/wiki/GMOD_Americas_2011 http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/Help_Desk_Feedback

1 0

Loading new tools without restarting galaxy
by Chaolin Zhang 03 Dec '10

03 Dec '10

Hi, We have a local galaxy mirror installed and running. Is there a way to add new tools without restarting galaxy? Thanks! Chaolin

4 3

pbs pro and galaxy (change status error)
by Laure QUINTRIC 02 Dec '10

02 Dec '10

Hello, We almost succeed in using drmaa-0.4b3 with galaxy and pbs pro : The job launched from galaxy runs on our cluster but when job status changes to finished, there's an error in drmaa python egg. Here is the server log : galaxy.jobs.runners.drmaa ERROR 2010-11-26 17:11:10,857 (21/516559.service0.ice.ifremer.fr) Unable to check job status Traceback (most recent call last): File "/home12/caparmor/bioinfo/galaxy_dist/lib/galaxy/jobs/runners/drmaa.py", line 252, in check_watched_items state = self.ds.jobStatus( job_id ) File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/__init__.py", line 522, in jobStatus File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/helpers.py", line 213, in c return f(*(args + (error_buffer, sizeof(error_buffer)))) File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/errors.py", line 90, in error_check raise _ERRORS[code-1]("code %s: %s" % (code, error_buffer.value)) InternalException: code 1: pbs_statjob: Job %s has finished galaxy.jobs.runners.drmaa WARNING 2010-11-26 17:11:10,861 (21/516559.service0.ice.ifremer.fr) job will now be errored galaxy.jobs.runners.drmaa DEBUG 2010-11-26 17:11:10,986 (21/516559.service0.ice.ifremer.fr) User killed running job, but error encountered removing from DRM queue: code 1: pbs_deljob: Job %s has finished Any idea ? Thanks a lot Laure

2 3

Independent Assessment of Galaxy
by Anton Nekrutenko 02 Dec '10

02 Dec '10

Dear galaxy-dev members: Professor Nils Christophersen (Department of Informatics, University of Oslo) has asked me to provide him with a few contacts of people who are independent from the core galaxy team. He needs these to get an idea why people have chosen Galaxy for their particular needs. If some of you would be willing to send him (he is CC'ed on this e-mail) a short blurb we would very much appreciate this! Thanks for your time! anton Anton Nekrutenko http://usegalaxy.org

1 0