An idea for sharing tools
by James Casbon
Hi Everyone,
I have an "I want a pony" idea that I would like to kick onto the mailing list:
It would be great if there was a way for sharing tool definitions between users.
At the moment, the main repo is maintained by the galaxy team, and
that is fine and makes sense. However, I'm sure there is a lot of
duplicated work between the users when adding other tools in. For
example, there was a conversation the other day about adding in awk.
Someone had already done this, so the best idea would be if I could
pull in that definition and enable it with minimum effort. I have
already added tools (exonerate, restriction mapper, etc, etc) that may
be of use to other people. Not sure the best way to go about this,
but if my understanding of mercurial is right, we can simply offer
another repo for people to pull changes from.
If this is of interest to you, please can you reply? If we get enough
interest and preferably some support of the core team, I could set up
a free repo at, e.g., bitbucket and add users to it. Or perhaps there
is a better way (eg patches submitted to trac)? Another question is
what kind of tools would the core team accept for inclusion in the
main dist?
Cheers,
James
13 years
Local Mirror *away* from server root
by Suraj Mukatira
Hi,
I modified the universe_wsgi.ini file as below, based on the FAQ input
# ---- HTTP Server ----------------------------------------------------------
[server:main]
use = egg:Paste#http
port = 8080
host = 127.0.0.1
use_threadpool = true
threadpool_workers = 10
# ---- Galaxy Web Interface -------------------------------------------------
#[app:main]
[composite:main]
use=egg:Paste#urlmap[[BR]] /galaxy/=galaxy
[app:galaxy]
# Specifies the factory for the universe WSGI application
paste.app_factory = galaxy.web.buildapp:app_factory
log_level = DEBUG
...rest same
However I get the following error:
LookupError: Entry point 'urlmap[[BR]] /galaxy/=galaxy' not found in egg 'Paste' (dir: /var/www/html/galaxy_dist/eggs/py2.5-noplatform/Paste-1.5.1-py2.5.egg; protocols: paste.app_factory, paste.composite_factory, paste.composit_factory; entry_points: )
What is wrong with my modification of universe_wsgi.ini
Thanks in anticiaption
Sm
13 years, 5 months
Workflows with Tools that use DBKEY
by Assaf Gordon
Hello,
Is there a way to add a tool that needs a 'dbkey' information (that is -
a genome metadata information) in an <option> tag to a workflow ?
For example: the Lift-Over tool ?
Thanks,
Gordon.
13 years, 6 months
Tool XML Question - using option/from_file name in 'label'
by Assaf Gordon
Hello,
I have a tool which has an <option> with 'from_file' attribute:
...
<param name="database" type="select" label="Mapping Database">
<options from_file="mapping_filter_databases.txt">
<column name="name" index="1"/>
<column name="value" index="0"/>
</options>
</param>
...
I want to use the database name in the label of the output dataset:
<outputs>
<data format="fasta"
name="output_mapped" label="Mappers of $input.name to $database" />
</outputs>
The '$database' variables is replaced with the database's filename (the
'value' column). I'd like it to be the 'name' column, but using
"$database.name" throws an runtime exception:
NotFound: cannot find 'name' while searching for 'database.name'
(full backtrace is listed below).
Is there a way to display the name instead of the value in an <option>
clause ?
Thanks!
Gordon.
URL: http://tango/tool_runner/index
File
'build/bdist.solaris-2.11-i86pc/egg/weberror/evalexception/middleware.py',
line 364 in respond
File 'build/bdist.solaris-2.11-i86pc/egg/paste/debug/prints.py', line 98
in __call__
File 'build/bdist.solaris-2.11-i86pc/egg/paste/wsgilib.py', line 539 in
intercept_output
File 'build/bdist.solaris-2.11-i86pc/egg/beaker/session.py', line 103 in
__call__
File 'build/bdist.solaris-2.11-i86pc/egg/paste/recursive.py', line 80 in
__call__
File 'build/bdist.solaris-2.11-i86pc/egg/paste/httpexceptions.py', line
632 in __call__
File '/media/sdb1/galaxy/galaxy_devel/lib/galaxy/web/framework/base.py',
line 126 in __call__
body = method( trans, **kwargs )
File
'/media/sdb1/galaxy/galaxy_devel/lib/galaxy/web/controllers/tool_runner.py',
line 45 in index
template, vars = tool.handle_input( trans, params.__dict__ )
File '/media/sdb1/galaxy/galaxy_devel/lib/galaxy/tools/__init__.py',
line 708 in handle_input
out_data = self.execute( trans, incoming=params )
File '/media/sdb1/galaxy/galaxy_devel/lib/galaxy/tools/__init__.py',
line 915 in execute
return self.tool_action.execute( self, trans, incoming=incoming,
set_output_hid=set_output_hid )
File
'/media/sdb1/galaxy/galaxy_devel/lib/galaxy/tools/actions/__init__.py',
line 159 in execute
data.name = fill_template( output.label, context=params )
File '/media/sdb1/galaxy/galaxy_devel/lib/galaxy/util/template.py', line
9 in fill_template
return str( Template( source=template_text, searchList=[context] ) )
File '<string>', line 28 in respond
NotFound: cannot find 'name' while searching for 'database.name'
13 years, 6 months
MAF alignments
by Stewart Stevens
Hi,
I have a local Galaxy instance. (http://bioanalysis.otago.ac.nz/
galaxy) A problem I'm having is when I try to extract a multiple
alignment I get an error, "The MAF source specified
(28_WAY_MULTIZ_hg18) appears to be invalid.,"
The background is that I have downloaded the alignment from http://hgdownload.cse.ucsc.edu/goldenPath
and modified the maf_index.loc file according to the commented
example.
I'm new to python but I did have a look in the debugger and it seems
to be missing an index file. The parameters it has loaded from
maf_index.loc appear to be as expected. I'm guessing with more
looking at the code it will become clear that a script to produce
these indexes from mafs needs to be run. Or should these indexes be
being built on the fly? Maybe someone can save me a load more
digging? Thanks if you can!
Cheers,
Stewart
--
Stewart Stevens
Software Developer - Integrated Genomics
University of Otago
Department of Biochemistry
Tel: +64(0)3 479 7863
Fax: +64(0)3 479 7866
13 years, 6 months
Suggestion: new tag in tool's XML file (long)
by Assaf Gordon
Hello,
As recently discussed on this mailing list,
With long workflows it becomes somewhat difficult to make sense of the
datasets names.
The 'label' option in the <output> tag is good start, but there's still
room for improvement.
I propose a new 'tag' element which might help a little bit.
Here's an example:
I have five tools in a simple workflow:
Starting with a FASTQ file,
FASTQ-to-FASTA
FASTA-clipper,
FASTA-collapser,
FASTA-filter-by-length,
FASTA-low-complexity-repeats-remover.
Each tool has an <output> option with a label. The label takes the
description of the tool, and the name of the input.
Example from the low-complexity-repeats-remove tool:
<output>
<data format="input" name="output"
label="$input.name (without low-complexity repeats)"
metadata_source="input" />
</output>
Similarly, most of the other tools have a 'label' option in the XML file.
The problem is that after running the workflow, the name of each dataset
becomes longer and longer, and more cumbersome.
Even if I start with a very short name for the initial FASTQ file (e.g.
'AG14'), The other datasets are named like:
AG14 (Fasta),
AG14 (Fasta) clipped,
AG14 (Fasta) clipped collapsed,
Filter sequences by length on data 5
Filter sequences by length on data 5 (without low complexity repeats)
(The filter-sequences-by-length tool doesn't have a 'label' option with
makes the output even worse.)
My suggestion:
I would like each data set to have a 'tag' - a short name which is
carried over from one dataset to the next, without taking the entire
dataset's name. This 'tag' is usually just the name of the initial dataset.
going back to the previous example, if I use the initial library name as
the tag (=AG14), the output of the workflow would look like:
AG14
AG14 (Fasta)
AG14 (Clipped)
AG14 (Collapsed)
AG14 (Filtered Sequences)
AG14 (without low complexity repeats)
I tried to add this functionality to the galaxy source code.
I'm aware that the Right Thing to do is probably to add a new column to
the relevant database table, and setup the tools to automatically take
the tag from one dataset to the next. However, as an intermediate
solution, my hack works nicely (I'm attaching pictures of 'before' and
'after :-) ).
With this hack, one can use the following 'label' in the XML <output> tag:
<outputs>
<data format="input"
name="output"
label="$input.tag clipped"
metadata_source="input" />
</outputs>
The "$input.tag" extract the tag from the input's name, and puts in the
square brackets, so that the next tool will also be able to extract the tag.
Here's the added code ( ./lib/galaxy/model/__init__.py ):
...
class HistoryDatasetAssociation( object ):
...
@property
def tag ( self ):
#Hack by gordon:
#add a '.tag' attribute
tag_match = re.search( '\[([^[]+)\]', self.name);
if tag_match:
tag = "[" + tag_match.group(1) + "]" ;
else:
tag = "[" + self.name + "]" ;
return tag
Thanks for reading so far,
Any comments and welcomed,
Gordon.
13 years, 6 months
Apache auth and sending data
by Sean Davis
I am running galaxy behind apache in proxy and using apache auth for login.
Works great. However, when I want to send data to, for example, UCSC as a
custom track, the request does not go through because the UCSC server cannot
send the auth credentials. Has anyone worked around this problem?
Thanks,
Sean
13 years, 7 months
Workflow improvement requests (long)
by Assaf Gordon
Dear all,
Recently, users (of our local galaxy server) started using workflows,
and are very pleased. However, as workflows get more complicated, it
gets harder to track the input and output of the workflows.
I'd like to share an example, to illustrate the problems that we encounter.
The workflow (pictured in the attached 'workflow.jpg') takes 4 input
datasets, and produces 4 output datasets.
The first problem is that there's no way to differentiate between the
input datasets (They appear simply as "Step 1: Input dataset", "Step 2:
Input Dataset", etc). Since each dataset has a specific role, I've had
to print the workflow and give the users instructions as to which
dataset (in their history) goes into what dataset. (see attached
'crosstab_workflow_input_datasets.jpg').
The second problem is that whenever I change something in the workflow
and save it - the order of the dataset change!
So what was once dataset 1, can now be dataset 2,3 or 4.
Users have no way of knowing this... (keen users might notice the the
description of the first tool changed from "Output dataset 'output' from
step 2" to "Output dataset' output' from step 4" - but this is very
obscure...).
The third problem is that once the workflow completes, the resulting
dataset have cryptic names such as "Join two queries on Data 10 and Data
2". Since "Data 10" is "Awk on Data 8" and data-8 is "Generic
Annotations on Data 7 and Data 1" and data-7 is "Intersect data 1 and
data 6" - it gets a bit hard to know what's going on. (see attached
'crosstab_history.png').
For the meantime, I've simply gave written instructions on what each
dataset means (see attached 'crosstab_workflow_dataset_explnanations.jpg).
If I may suggest a feature - it would be great if I could name a dataset
inside the workflow. Instead of naming it "Input dataset" I could give
it a descriptive name, so even if the order of the input datasets
changes, users will know which dataset goes into which input.
Regarding the output dataset names, the 'label' option in the tools' XML
is a good start, but still creates very long, hard-to-understand names.
Another great feature would be the possibility to add an 'output label'
for each step in the workflow.
Regardless of the above, I'd like to say (once again) that Galaxy is a
great tool, and workflows are really cool - we have several long
workflows which do wonderful things.
Thanks for reading so far,
Gordon.
13 years, 7 months
Moving and processing large files: will a local installation help?
by Johannes Waage
Hi all,
I'm moving quite a lot of large files (>2 GB) between Galaxy and our local
network for processing (mainly sequence retrieval and interval processing).
This is a tad on the slow side; will a local installation of Galaxy help
improve the speed? Or will the gain in transfer speed be lost in increased
processing time (I do not plan on installing on a cluster)?
Thankyou in advance,
Johannes Waage,
Uni. of Copenhagen, Denmark
13 years, 7 months