I have an "I want a pony" idea that I would like to kick onto the mailing list:
It would be great if there was a way for sharing tool definitions between users.
At the moment, the main repo is maintained by the galaxy team, and
that is fine and makes sense. However, I'm sure there is a lot of
duplicated work between the users when adding other tools in. For
example, there was a conversation the other day about adding in awk.
Someone had already done this, so the best idea would be if I could
pull in that definition and enable it with minimum effort. I have
already added tools (exonerate, restriction mapper, etc, etc) that may
be of use to other people. Not sure the best way to go about this,
but if my understanding of mercurial is right, we can simply offer
another repo for people to pull changes from.
If this is of interest to you, please can you reply? If we get enough
interest and preferably some support of the core team, I could set up
a free repo at, e.g., bitbucket and add users to it. Or perhaps there
is a better way (eg patches submitted to trac)? Another question is
what kind of tools would the core team accept for inclusion in the
I modified the universe_wsgi.ini file as below, based on the FAQ input
# ---- HTTP Server ----------------------------------------------------------
use = egg:Paste#http
port = 8080
host = 127.0.0.1
use_threadpool = true
threadpool_workers = 10
# ---- Galaxy Web Interface -------------------------------------------------
# Specifies the factory for the universe WSGI application
paste.app_factory = galaxy.web.buildapp:app_factory
log_level = DEBUG
However I get the following error:
LookupError: Entry point 'urlmap[[BR]] /galaxy/=galaxy' not found in egg 'Paste' (dir: /var/www/html/galaxy_dist/eggs/py2.5-noplatform/Paste-1.5.1-py2.5.egg; protocols: paste.app_factory, paste.composite_factory, paste.composit_factory; entry_points: )
What is wrong with my modification of universe_wsgi.ini
Thanks in anticiaption
I have a tool which has an <option> with 'from_file' attribute:
<param name="database" type="select" label="Mapping Database">
<column name="name" index="1"/>
<column name="value" index="0"/>
I want to use the database name in the label of the output dataset:
name="output_mapped" label="Mappers of $input.name to $database" />
The '$database' variables is replaced with the database's filename (the
'value' column). I'd like it to be the 'name' column, but using
"$database.name" throws an runtime exception:
NotFound: cannot find 'name' while searching for 'database.name'
(full backtrace is listed below).
Is there a way to display the name instead of the value in an <option>
line 364 in respond
File 'build/bdist.solaris-2.11-i86pc/egg/paste/debug/prints.py', line 98
File 'build/bdist.solaris-2.11-i86pc/egg/paste/wsgilib.py', line 539 in
File 'build/bdist.solaris-2.11-i86pc/egg/beaker/session.py', line 103 in
File 'build/bdist.solaris-2.11-i86pc/egg/paste/recursive.py', line 80 in
File 'build/bdist.solaris-2.11-i86pc/egg/paste/httpexceptions.py', line
632 in __call__
line 126 in __call__
body = method( trans, **kwargs )
line 45 in index
template, vars = tool.handle_input( trans, params.__dict__ )
line 708 in handle_input
out_data = self.execute( trans, incoming=params )
line 915 in execute
return self.tool_action.execute( self, trans, incoming=incoming,
line 159 in execute
data.name = fill_template( output.label, context=params )
File '/media/sdb1/galaxy/galaxy_devel/lib/galaxy/util/template.py', line
9 in fill_template
return str( Template( source=template_text, searchList=[context] ) )
File '<string>', line 28 in respond
NotFound: cannot find 'name' while searching for 'database.name'
I have a local Galaxy instance. (http://bioanalysis.otago.ac.nz/
galaxy) A problem I'm having is when I try to extract a multiple
alignment I get an error, "The MAF source specified
(28_WAY_MULTIZ_hg18) appears to be invalid.,"
The background is that I have downloaded the alignment from http://hgdownload.cse.ucsc.edu/goldenPath
and modified the maf_index.loc file according to the commented
I'm new to python but I did have a look in the debugger and it seems
to be missing an index file. The parameters it has loaded from
maf_index.loc appear to be as expected. I'm guessing with more
looking at the code it will become clear that a script to produce
these indexes from mafs needs to be run. Or should these indexes be
being built on the fly? Maybe someone can save me a load more
digging? Thanks if you can!
Software Developer - Integrated Genomics
University of Otago
Department of Biochemistry
Tel: +64(0)3 479 7863
Fax: +64(0)3 479 7866
As recently discussed on this mailing list,
With long workflows it becomes somewhat difficult to make sense of the
The 'label' option in the <output> tag is good start, but there's still
room for improvement.
I propose a new 'tag' element which might help a little bit.
Here's an example:
I have five tools in a simple workflow:
Starting with a FASTQ file,
Each tool has an <output> option with a label. The label takes the
description of the tool, and the name of the input.
Example from the low-complexity-repeats-remove tool:
<data format="input" name="output"
label="$input.name (without low-complexity repeats)"
Similarly, most of the other tools have a 'label' option in the XML file.
The problem is that after running the workflow, the name of each dataset
becomes longer and longer, and more cumbersome.
Even if I start with a very short name for the initial FASTQ file (e.g.
'AG14'), The other datasets are named like:
AG14 (Fasta) clipped,
AG14 (Fasta) clipped collapsed,
Filter sequences by length on data 5
Filter sequences by length on data 5 (without low complexity repeats)
(The filter-sequences-by-length tool doesn't have a 'label' option with
makes the output even worse.)
I would like each data set to have a 'tag' - a short name which is
carried over from one dataset to the next, without taking the entire
dataset's name. This 'tag' is usually just the name of the initial dataset.
going back to the previous example, if I use the initial library name as
the tag (=AG14), the output of the workflow would look like:
AG14 (Filtered Sequences)
AG14 (without low complexity repeats)
I tried to add this functionality to the galaxy source code.
I'm aware that the Right Thing to do is probably to add a new column to
the relevant database table, and setup the tools to automatically take
the tag from one dataset to the next. However, as an intermediate
solution, my hack works nicely (I'm attaching pictures of 'before' and
'after :-) ).
With this hack, one can use the following 'label' in the XML <output> tag:
The "$input.tag" extract the tag from the input's name, and puts in the
square brackets, so that the next tool will also be able to extract the tag.
Here's the added code ( ./lib/galaxy/model/__init__.py ):
class HistoryDatasetAssociation( object ):
def tag ( self ):
#Hack by gordon:
#add a '.tag' attribute
tag_match = re.search( '\[([^+)\]', self.name);
tag = "[" + tag_match.group(1) + "]" ;
tag = "[" + self.name + "]" ;
Thanks for reading so far,
Any comments and welcomed,
I am running galaxy behind apache in proxy and using apache auth for login.
Works great. However, when I want to send data to, for example, UCSC as a
custom track, the request does not go through because the UCSC server cannot
send the auth credentials. Has anyone worked around this problem?
Recently, users (of our local galaxy server) started using workflows,
and are very pleased. However, as workflows get more complicated, it
gets harder to track the input and output of the workflows.
I'd like to share an example, to illustrate the problems that we encounter.
The workflow (pictured in the attached 'workflow.jpg') takes 4 input
datasets, and produces 4 output datasets.
The first problem is that there's no way to differentiate between the
input datasets (They appear simply as "Step 1: Input dataset", "Step 2:
Input Dataset", etc). Since each dataset has a specific role, I've had
to print the workflow and give the users instructions as to which
dataset (in their history) goes into what dataset. (see attached
The second problem is that whenever I change something in the workflow
and save it - the order of the dataset change!
So what was once dataset 1, can now be dataset 2,3 or 4.
Users have no way of knowing this... (keen users might notice the the
description of the first tool changed from "Output dataset 'output' from
step 2" to "Output dataset' output' from step 4" - but this is very
The third problem is that once the workflow completes, the resulting
dataset have cryptic names such as "Join two queries on Data 10 and Data
2". Since "Data 10" is "Awk on Data 8" and data-8 is "Generic
Annotations on Data 7 and Data 1" and data-7 is "Intersect data 1 and
data 6" - it gets a bit hard to know what's going on. (see attached
For the meantime, I've simply gave written instructions on what each
dataset means (see attached 'crosstab_workflow_dataset_explnanations.jpg).
If I may suggest a feature - it would be great if I could name a dataset
inside the workflow. Instead of naming it "Input dataset" I could give
it a descriptive name, so even if the order of the input datasets
changes, users will know which dataset goes into which input.
Regarding the output dataset names, the 'label' option in the tools' XML
is a good start, but still creates very long, hard-to-understand names.
Another great feature would be the possibility to add an 'output label'
for each step in the workflow.
Regardless of the above, I'd like to say (once again) that Galaxy is a
great tool, and workflows are really cool - we have several long
workflows which do wonderful things.
Thanks for reading so far,
I'm moving quite a lot of large files (>2 GB) between Galaxy and our local
network for processing (mainly sequence retrieval and interval processing).
This is a tad on the slow side; will a local installation of Galaxy help
improve the speed? Or will the gain in transfer speed be lost in increased
processing time (I do not plan on installing on a cluster)?
Thankyou in advance,
Uni. of Copenhagen, Denmark