UCSC genome browser
by Rene Dreos (JIC)
Dear Developer
my institute would like to run a local version of Galaxy and I managed
to properly install it on my mac. We are focused on plant and
especially on Arabidopsis. I managed to upgrade our local Galaxy with
Arabidopsis genome and I am able to run some analysis (ChIP-seq) from
mapping to peak detection with MACS. The next step would be to
visualize the results on UCSC genome browser. We have a local version
of it (with the latest Arabidopsis genome assembly) but MACS result
hasn't got any link to UCSC GB but only to GeneTrack (that is not
working). How can I make Galaxy aware of our local GB?
thank you very much
kind regard
r
11 years, 7 months
slow execution of set_metadata.py
by Matthias Gierth
Hello List,
I have a small problem with my own local Galaxy instance.
So I try to set up some workflows for NGS. Everything is working fine
for now, but the process for setting the Metadata on a file takes a lot
of time.
Currently I created a workflow --> fastq file from Library -->
fastx_groomer( convert from Illumina to sanger format)-->mapping with bwa
So the grooming and mapping runs fine, but after mapping the
set_metadata.py takes longer than the mapping of the 2gb fastq-file.
The testing server is a Dell R710 with 2x6-core cpus and 72GB of memory
Below there is my Config for Galaxy.
Anybody an Idea whats going wrong with my setup?
many thanks
Matthias
#
# Galaxy is configured by default to be useable in a single-user development
# environment. To tune the application for a multi-user production
# environment, see the documentation at:
#
# http://bitbucket.org/galaxy/galaxy-central/wiki/Config/ProductionServer
#
# Throughout this sample configuration file, except where stated otherwise,
# uncommented values override the default if left unset, whereas commented
# values are set to the default value.
# examples of many of these options are explained in more detail in the
wiki:
#
# Config hackers are encouraged to check there before asking for help.
# ---- HTTP Server
----------------------------------------------------------
# Configuration of the internal HTTP server.
[server:main]
# The internal HTTP server to use. Currently only Paste is provided. This
# option is required.
use = egg:Paste#http
# The port on which to listen.
port = 8081
# The address on which to listen. By default, only listen to localhost
(Galaxy
# will not be accessible over the network). Use '0.0.0.0' to listen on all
# available network interfaces.
host = 0.0.0.0
# Use a threadpool for the web server instead of creating a thread for each
# request.
use_threadpool = True
# Number of threads in the web server thread pool.
threadpool_workers = 8
# ---- Filters
--------------------------------------------------------------
# Filters sit between Galaxy and the HTTP server.
# These filters are disabled by default. They can be enabled with
# 'filter-with' in the [app:main] section below.
# Define the gzip filter.
[filter:gzip]
use = egg:Paste#gzip
# Define the proxy-prefix filter.
[filter:proxy-prefix]
use = egg:PasteDeploy#prefix
prefix = /galaxy
# ---- Galaxy
---------------------------------------------------------------
# Configuration of the Galaxy application.
[app:main]
# -- Application and filtering
# The factory for the WSGI application. This should not be changed.
paste.app_factory = galaxy.web.buildapp:app_factory
# If not running behind a proxy server, you may want to enable gzip
compression
# to decrease the size of data transferred over the network. If using a
proxy
# server, please enable gzip compression there instead.
#filter-with = gzip
# If running behind a proxy server and Galaxy is served from a subdirectory,
# enable the proxy-prefix filter and set the prefix in the
# [filter:proxy-prefix] section above.
#filter-with = proxy-prefix
# If proxy-prefix is enabled and you're running more than one Galaxy
instance
# behind one hostname, you will want to set this to the same path as the
prefix
# in the filter above. This value becomes the "path" attribute set in the
# cookie so the cookies from each instance will not clobber each other.
#cookie_path = None
# -- Database
# By default, Galaxy uses a SQLite database at
'database/universe.sqlite'. You
# may use a SQLAlchemy connection string to specify an external database
# instead. This string takes many options which are explained in detail
in the
# config file documentation.
database_connection =
mysql://xxx:xxx@localhost/galaxy?unix_socket=/data/mysql/mysql.sock
# If the server logs errors about not having enough database pool
connections,
# you will want to increase these values, or consider running more Galaxy
# processes.
#database_engine_option_pool_size = 5
#database_engine_option_max_overflow = 10
# If using MySQL and the server logs the error "MySQL server has gone away",
# you will want to set this to some positive value (7200 should work).
database_engine_option_pool_recycle = 7200
# If large database query results are causing memory or response time
issues in
# the Galaxy process, leave the result on the server instead. This
option is
# only available for PostgreSQL and is highly recommended.
#database_engine_option_server_side_cursors = False
# Create only one connection to the database per thread, to reduce the
# connection overhead. Recommended when not using SQLite:
#database_engine_option_strategy = threadlocal
# -- Files and directories
# Dataset files are stored in this directory.
file_path = /galaxytemp/data
# Temporary files are stored in this directory.
new_file_path = /galaxytemp/tmp
# Tool config file, defines what tools are available in Galaxy.
#tool_config_file = tool_conf.xml
# Path to the directory containing the tools defined in the config.
#tool_path = tools
# Directory where data used by tools is located, see the samples in that
# directory and the wiki for help:
# http://bitbucket.org/galaxy/galaxy-central/wiki/DataIntegration
#tool_data_path = tool-data
# Datatypes config file, defines what data (file) types are available in
# Galaxy.
#datatypes_config_file = datatypes_conf.xml
# -- Mail and notification
# Galaxy sends mail for various things: Subscribing users to the mailing
list
# if they request it, emailing password resets, notification from the Galaxy
# Sample Tracking system, and reporting dataset errors. To do this, it
needs
# to send mail through an SMTP server, which you may define here.
#smtp_server = None
# On the user registration form, users may choose to join the mailing list.
# This is the address of the list they'll be subscribed to.
#mailing_join_addr = galaxy-user-join(a)bx.psu.edu
# Datasets in an error state include a link to report the error. Those
reports
# will be sent to this address. Error reports are disabled if no
address is set.
#error_email_to = None
# -- Display sites
# Galaxy can display data at various external browsers. These options
specify
# which browsers should be available. URLs and builds available at these
# browsers are defined in the specifield files.
# UCSC browsers: tool-data/shared/ucsc/ucsc_build_sites.txt
#ucsc_display_sites = main,test,archaea,ucla
# GBrowse servers: tool-data/shared/gbrowse/gbrowse_build_sites.txt
#gbrowse_display_sites = wormbase,tair,modencode_worm,modencode_fly
# GeneTrack servers: tool-data/shared/genetrack/genetrack_sites.txt
#genetrack_display_sites = main,test
# -- UI Localization
# Append "/{brand}" to the "Galaxy" text in the masthead.
#brand = None
# The URL linked by the "Galaxy/brand" text.
#logo_url = /
# The URL linked by the "Galaxy Wiki" link in the "Help" menu.
#wiki_url = http://bitbucket.org/galaxy/galaxy-central/wiki
# The URL linked by the "Email comments..." link in the "Help" menu.
#bugs_email = None
# The URL linked by the "How to Cite..." link in the "Help" menu.
#citation_url = http://bitbucket.org/galaxy/galaxy-central/wiki/Citations
# Serve static content, which must be enabled if you're not serving it via a
# proxy server. These options should be self explanatory and so are not
# documented individually. You can use these paths (or ones in the proxy
# server) to point to your own styles.
static_enabled = True
static_cache_time = 360
static_dir = %(here)s/static/
static_images_dir = %(here)s/static/images
static_favicon_dir = %(here)s/static/favicon.ico
static_scripts_dir = %(here)s/static/scripts/
static_style_dir = %(here)s/static/june_2007_style/blue
# -- Logging and Debugging
# Verbosity of console log messages. Acceptable values can be found here:
# http://docs.python.org/library/logging.html#logging-levels
#log_level = DEBUG
# Print database operations to the server log (warning, quite verbose!).
#database_engine_option_echo = False
# Print database pool operations to the server log (warning, quite
verbose!).
#database_engine_option_echo_pool = False
# Turn on logging of application events and some user events to the
database.
#log_events = True
# Turn on logging of user actions to the database. Actions currently
logged are
# grid views, tool searches, and use of "recently" used tools menu. The
# log_events and log_actions functionality will eventually be merged.
#log_actions = True
# Debug enables access to various config options useful for development and
# debugging: use_lint, use_profile, use_printdebug and use_interactive. It
# also causes the files used by PBS/SGE (submission script, output, and
error)
# to remain on disk after the job is complete. Debug mode is disabled if
# commented, but is uncommented by default in the sample config.
debug = True
# Check for WSGI compliance.
#use_lint = False
# Run the Python profiler on each request.
#use_profile = False
# Intercept print statements and show them on the returned page.
#use_printdebug = True
# Enable live debugging in your browser. This should NEVER be enabled on a
# public site. Enabled in the sample config for development.
use_interactive = True
# Write thread status periodically to 'heartbeat.log', (careful, uses disk
# space rapidly!). Useful to determine why your processes may be
consuming a
# lot of CPU.
#use_heartbeat = False
# Enable the memory debugging interface (careful, negatively impacts server
# performance).
#use_memdump = False
# -- Data Libraries
# These library upload options are described in much more detail in the
wiki:
#
http://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/UploadingFiles
# Add an option to the library upload form which allows administrators to
# upload a directory of files.
library_import_dir = /data/
# Add an option to the library upload form which allows authorized
# non-administrators to upload a directory of files. The configured
directory
# must contain sub-directories named the same as the non-admin user's Galaxy
# login ( email ). The non-admin user is restricted to uploading files or
# sub-directories of files contained in their directory.
#user_library_import_dir = None
# Add an option to the admin library upload tool allowing admins to paste
# filesystem paths to files and directories in a box, and these paths
will be
# added to a library. Set to True to enable. Please note the security
# implication that this will give Galaxy Admins access to anything your
Galaxy
# user has access to.
allow_library_path_paste = True
# -- Users and Security
# Galaxy encodes various internal values when these values will be output in
# some format (for example, in a URL or cookie). You should set a key to be
# used by the algorithm that encodes and decodes these values. It can
be any
# string. If left unchanged, anyone could construct a cookie that would
grant
# them access to others' sessions.
#id_secret = USING THE DEFAULT IS NOT SECURE!
# User authentication can be delegated to an upstream proxy server (usually
# Apache). The upstream proxy should set a REMOTE_USER header in the
request.
# Enabling remote user disables regular logins. For more information, see:
# http://bitbucket.org/galaxy/galaxy-central/wiki/Config/ApacheProxy
#use_remote_user = False
# If use_remote_user is enabled and your external authentication
# method just returns bare usernames, set a default mail domain to be
appended
# to usernames, to become your Galaxy usernames (email addresses).
#remote_user_maildomain = None
# If use_remote_user is enabled, you can set this to a URL that will log
your
# users out.
#remote_user_logout_href = None
# Administrative users - set this to a comma-separated list of valid Galaxy
# users (email addresses). These users will have access to the Admin
section
# of the server, and will have access to create users, groups, roles,
# libraries, and more. For more information, see:
# http://bitbucket.org/galaxy/galaxy-central/wiki/Admin/AdminInterface
admin_users = xxxx@biotec.tu-dresden.de,xxxx@biotec.tu-dresden.de
# Force everyone to log in (disable anonymous access).
#require_login = False
# Allow unregistered users to create new accounts (otherwise, they will
have to
# be created by an admin).
#allow_user_creation = True
# Allow administrators to delete accounts.
#allow_user_deletion = False
# By default, users' data will be public, but setting this to True will
cause
# it to be private. Does not affect existing users and data, only ones
created
# after this option is set. Users may still change their default back to
# public.
new_user_dataset_access_role_default_private = True
# -- Beta features
# Enable Galaxy's built-in visualization module, Trackster.
#enable_tracks = False
# Enable Galaxy Pages. Pages are custom webpages that include embedded
Galaxy items,
# such as datasets, histories, workflows, and visualizations; pages are
useful for
# documenting and sharing multiple analyses or workflows. Pages are
created using a
# WYSIWYG editor that is very similar to a word processor.
#enable_pages = False
# Enable the (experimental! beta!) Web API. Documentation forthcoming.
#enable_api = False
# -- Job Execution
# If running multiple Galaxy processes, one can be designated as the job
# runner. For more information, see:
#
http://bitbucket.org/galaxy/galaxy-central/wiki/Config/WebApplicationScaling
enable_job_running = True
# Should jobs be tracked through the database, rather than in memory.
# Necessary if you're running the load balanced setup.
track_jobs_in_database = True
# Enable job recovery (if Galaxy is restarted while cluster jobs are
running,
# it can "recover" them when it starts). This is not safe to use if you are
# running more than one Galaxy server using the same database.
enable_job_recovery = True
# Setting metadata on job outputs to in a separate process (or if using a
# cluster, on the cluster). Thanks to Python's Global Interpreter Lock
and the
# hefty expense that setting metadata incurs, your Galaxy process may become
# unresponsive when this operation occurs internally.
set_metadata_externally = True
# Although it is fairly reliable, setting metadata can occasionally
fail. In
# these instances, you can choose to retry setting it internally or
leave it in
# a failed state (since retrying internally may cause the Galaxy process
to be
# unresponsive). If this option is set to False, the user will be given the
# option to retry externally, or set metadata manually (when possible).
#retry_metadata_internally = True
# Number of concurrent jobs to run (local job runner)
local_job_queue_workers = 7
# Jobs can be killed after a certain amount of execution time. Format is in
# hh:mm:ss. Currently only implemented for PBS.
#job_walltime = None
# Clustering Galaxy is not a straightforward process and requires some
# pre-configuration. See the the wiki before attempting to set any of these
# options:
# http://bitbucket.org/galaxy/galaxy-central/wiki/Config/Cluster
# Comma-separated list of job runners to start. local is always
started. If
# left commented, no jobs will be run on the cluster, even if a cluster
URL is
# explicitly defined in the [galaxy:tool_runners] section below. The
runners
# currently available are 'pbs' and 'drmaa'.
#start_job_runners = None
# The URL for the default runner to use when a tool doesn't explicity
define a
# runner below.
#default_cluster_job_runner = local:///
# The cluster runners have their own thread pools used to prepare and finish
# jobs (so that these sometimes lengthy operations do not block normal queue
# operation). The value here is the number of worker threads available
to each
# started runner.
#cluster_job_queue_workers = 3
# These options are only used when using file staging with PBS.
#pbs_application_server =
#pbs_stage_path =
#pbs_dataset_server =
# ---- Tool Job Runners
-----------------------------------------------------
# Individual per-tool job runner overrides. If not listed here, a tool will
# run with the runner defined with default_cluster_job_runner.
[galaxy:tool_runners]
biomart = local:///
encode_db1 = local:///
hbvar = local:///
microbial_import1 = local:///
ucsc_table_direct1 = local:///
ucsc_table_direct_archaea1 = local:///
ucsc_table_direct_test1 = local:///
upload1 = local:///
# ---- Galaxy Message Queue
-------------------------------------------------
# Galaxy uses AMQ protocol to receive messages from external sources like
# bar code scanners. Galaxy has been tested against RabbitMQ AMQP
implementation.
# For Galaxy to receive messages from a message queue the RabbitMQ
server has
# to be set up with a user account and other parameters listed below.
The 'host'
# and 'port' fields should point to where the RabbitMQ server is running.
[galaxy_amqp]
#host = 127.0.0.1
#port = 5672
#userid = galaxy
#password = galaxy
#virtual_host = galaxy_messaging_engine
#queue = galaxy_queue
#exchange = galaxy_exchange
#routing_key = bar_code_scanner
--
Matthias Gierth
System Administrator
Applied Bioinformatics Group (ABG)
TU Dresden, Biotec
Tatzberg 47-51
01307 Dresden
Phone: +49 (0)351 463 40020
Fax: +49 (0)351 463 40087
Email: matthias.gierth(a)biotec.tu-dresden.de
11 years, 7 months
uploading data from UCSC on scaled Apache configuration broken
by David Hoover
I have followed the directions for scaling at http://bitbucket.org/galaxy/galaxy-central/wiki/Config/WebApplicationScaling, trying both the single job-runner/single web-server and single job-runner/multiple web-server approaches.
In both cases, uploading data from UCSC failed with the following error message:
An error occurred running this job: The remote data source application may be off line, please try again later. Error: [Errno socket error] [Errno -3] Temporary failure in name resolution
What does this mean?
We have multiple web servers behind a load balancing switch. If I limit the number of web servers to one, then the data upload works. Must the remote data request return to the same server and port as it went out from?
David Hoover
Helix Systems Staff
http://helix.nih.gov
11 years, 7 months
Error setting-up and running bowtie tasks
by Chris Cole
I have previously been able to run bowtie jobs successfully, but now I
get this error:
galaxy.jobs.runners.drmaa ERROR 2010-12-06 15:09:44,068 failure running
job 2248
Traceback (most recent call last):
File
"/homes/www-galaxy/galaxy_devel/lib/galaxy/jobs/runners/drmaa.py", line
124, in queue_job
job_wrapper.prepare()
File "/homes/www-galaxy/galaxy_devel/lib/galaxy/jobs/__init__.py",
line 368, in prepare
self.command_line = self.tool.build_command_line( param_dict )
File "/homes/www-galaxy/galaxy_devel/lib/galaxy/tools/__init__.py",
line 1524, in build_command_line
command_line = fill_template( self.command, context=param_dict )
File "/homes/www-galaxy/galaxy_devel/lib/galaxy/util/template.py",
line 9, in fill_template
return str( Template( source=template_text, searchList=[context] ) )
File
"/homes/www-galaxy/galaxy_devel/eggs/Cheetah-2.2.2-py2.6-linux-x86_64-ucs2.egg/Cheetah/Template.py",
line 1004, in __str__
return getattr(self, mainMethName)()
File "DynamicallyCompiledCheetahTemplate.py", line 182, in respond
IndexError: list index out of range
I've a feeling it has something to with the bowtie_indices.loc file and
it's associated entry in tool_data_table_conf.xml. I can see that the
format for the .loc file has changed (it now has four columns instead of
two):
h_sapiens37 NCBI37 Homo sapiens ncbi37 /db/bowtie/h_sapiens_37_asm
But the xml doesn't match (it only has two columns):
<!-- Locations of indexes in the Bowtie mapper format -->
<table name="bowtie_indexes">
<columns>name, value</columns>
<file path="tool-data/bowtie_indices.loc" />
</table>
What should there be in the 'columns' section?
Thanks,
Chris
11 years, 7 months
Changing the FASTA to tabular converter
by Peter
Hello all,
I ran into a problem in a work flow manipulating FASTA and tabular files.
I traced this to an unexpected behaviour of the FASTA to tabular converter.
In my experience most command line tools which take FASTA files as
input treat the first word after the ">" as the identifier for each FASTA
record, and any subsequent text as an optional description. It could
then make sense to turn a FASTA file into a three column tabular file
(identifier, description, sequence). Currently Galaxy does not make this
distinction, so we have just two columns (identifier+description, seq).
Would you all be amenable to my extending this script to allow the user
to choose between 2 column output (current behaviour) and 3 column
output (splitting the FASTA ">" line at the first white space)?
Alternatively, I have written a less invasive patch to allow an easy
way to extract the identifier (first word) and sequence:
http://bitbucket.org/peterjc/galaxy-central/changeset/f57552b4f9fb
Note that currently the converter does allow the ">" line to be trimmed
which can achieve the same goal but ONLY when all the identifiers
are the same length (rarely the case in my experience).
Similarly, I'd like to extend the tabular to FASTA converter to allow
a third column to be selected as the description, giving for example
">c1 c3" as the ">" line, with the sequence coming from c2.
I look forward to comments,
Thanks,
Peter
P.S. All these comments apply equally to the FASTQ to/from tabular
converters.
11 years, 7 months
tool with no inputs?
by Kostas Karasavvas
Hi all,
Is that possible? As long as there is no <inputs> tag or if there is an
empty inputs tag I get an error. When I provide an input it works.
Is there a way around it? I need it to access a server that could take zero
arguments.
Cheers,
Kostas
11 years, 7 months
2011 GMOD Spring Training, March 8-12
by Dave Clements, GMOD Help Desk
Applications are now being accepted for the 2011 GMOD Spring Training
course a five-day hands-on school aimed at teaching new GMOD
administrators how to install, configure and integrate popular GMOD
components. The course will be held March 8-12 at the US National
Evolutionary Synthesis Center (NESCent) in Durham, North Carolina, as
part of GMOD Americas 2011.
Links:
* http://gmod.org/wiki/2011_GMOD_Spring_Training
* http://gmod.org/wiki/GMOD_Americas_2011
* http://www.nescent.org/
These components will be covered:
* Apollo - genome annotation editor
* Chado - biological database schema
* Galaxy - workflow system
* GBrowse - genome viewer
* GBrowse_syn - synteny viewer
* GFF3 - genome annotation file format and tools
* InterMine - biological data mining system
* JBrowse - next generation genome browser
* MAKER - genome annotation pipeline
* Tripal - web front end to Chado databases
The deadline for applying is the end of Friday, January 7, 2011.
Admission is competitive and is based on the strength of the
application, especially the statement of interest. The 2010 school had
over 60 applicants for the 25 slots. Any application received after
deadline will be automatically placed on the waiting list.
The course requires some knowledge of Linux as a prerequisite. The
registration fee will be $265 (only $53 per day!). There will be a
limited number of scholarships available.
This may be the only GMOD School offered in 2011. If you are
interested, you are strongly encouraged to apply by January 7.
Thanks,
Dave Clements
--
http://gmod.org/wiki/GMOD_Americas_2011
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/Help_Desk_Feedback
11 years, 7 months
pbs pro and galaxy (change status error)
by Laure QUINTRIC
Hello,
We almost succeed in using drmaa-0.4b3 with galaxy and pbs pro :
The job launched from galaxy runs on our cluster but when job status
changes to finished, there's an error in drmaa python egg.
Here is the server log :
galaxy.jobs.runners.drmaa ERROR 2010-11-26 17:11:10,857
(21/516559.service0.ice.ifremer.fr) Unable to check job status
Traceback (most recent call last):
File
"/home12/caparmor/bioinfo/galaxy_dist/lib/galaxy/jobs/runners/drmaa.py",
line 252, in check_watched_items
state = self.ds.jobStatus( job_id )
File
"/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/__init__.py",
line 522, in jobStatus
File
"/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/helpers.py",
line 213, in c
return f(*(args + (error_buffer, sizeof(error_buffer))))
File
"/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/errors.py", line
90, in error_check
raise _ERRORS[code-1]("code %s: %s" % (code, error_buffer.value))
InternalException: code 1: pbs_statjob: Job %s has finished
galaxy.jobs.runners.drmaa WARNING 2010-11-26 17:11:10,861
(21/516559.service0.ice.ifremer.fr) job will now be errored
galaxy.jobs.runners.drmaa DEBUG 2010-11-26 17:11:10,986
(21/516559.service0.ice.ifremer.fr) User killed running job, but error
encountered removing from DRM queue: code 1: pbs_deljob: Job %s has finished
Any idea ?
Thanks a lot
Laure
11 years, 7 months
Independent Assessment of Galaxy
by Anton Nekrutenko
Dear galaxy-dev members:
Professor Nils Christophersen (Department of Informatics, University of Oslo) has asked me to provide him with a few contacts of people who are independent from the core galaxy team. He needs these to get an idea why people have chosen Galaxy for their particular needs. If some of you would be willing to send him (he is CC'ed on this e-mail) a short blurb we would very much appreciate this!
Thanks for your time!
anton
Anton Nekrutenko
http://usegalaxy.org
11 years, 7 months