December 2008 - galaxy-user - lists.galaxyproject.org

An idea for sharing tools
by James Casbon 10 Jun '09

10 Jun '09

Hi Everyone, I have an "I want a pony" idea that I would like to kick onto the mailing list: It would be great if there was a way for sharing tool definitions between users. At the moment, the main repo is maintained by the galaxy team, and that is fine and makes sense. However, I'm sure there is a lot of duplicated work between the users when adding other tools in. For example, there was a conversation the other day about adding in awk. Someone had already done this, so the best idea would be if I could pull in that definition and enable it with minimum effort. I have already added tools (exonerate, restriction mapper, etc, etc) that may be of use to other people. Not sure the best way to go about this, but if my understanding of mercurial is right, we can simply offer another repo for people to pull changes from. If this is of interest to you, please can you reply? If we get enough interest and preferably some support of the core team, I could set up a free repo at, e.g., bitbucket and add users to it. Or perhaps there is a better way (eg patches submitted to trac)? Another question is what kind of tools would the core team accept for inclusion in the main dist? Cheers, James

7 8

Local Mirror *away* from server root
by Suraj Mukatira 08 Jan '09

08 Jan '09

Hi, I modified the universe_wsgi.ini file as below, based on the FAQ input # ---- HTTP Server ---------------------------------------------------------- [server:main] use = egg:Paste#http port = 8080 host = 127.0.0.1 use_threadpool = true threadpool_workers = 10 # ---- Galaxy Web Interface ------------------------------------------------- #[app:main] [composite:main] use=egg:Paste#urlmap[[BR]] /galaxy/=galaxy [app:galaxy] # Specifies the factory for the universe WSGI application paste.app_factory = galaxy.web.buildapp:app_factory log_level = DEBUG ...rest same However I get the following error: LookupError: Entry point 'urlmap[[BR]] /galaxy/=galaxy' not found in egg 'Paste' (dir: /var/www/html/galaxy_dist/eggs/py2.5-noplatform/Paste-1.5.1-py2.5.egg; protocols: paste.app_factory, paste.composite_factory, paste.composit_factory; entry_points: ) What is wrong with my modification of universe_wsgi.ini Thanks in anticiaption Sm

2 1

Workflows with Tools that use DBKEY
by Assaf Gordon 19 Dec '08

19 Dec '08

Hello, Is there a way to add a tool that needs a 'dbkey' information (that is - a genome metadata information) in an <option> tag to a workflow ? For example: the Lift-Over tool ? Thanks, Gordon.

2 1

Tool XML Question - using option/from_file name in 'label'
by Assaf Gordon 19 Dec '08

19 Dec '08

Hello, I have a tool which has an <option> with 'from_file' attribute: ... <param name="database" type="select" label="Mapping Database"> <options from_file="mapping_filter_databases.txt"> <column name="name" index="1"/> <column name="value" index="0"/> </options> </param> ... I want to use the database name in the label of the output dataset: <outputs> <data format="fasta" name="output_mapped" label="Mappers of $input.name to $database" /> </outputs> The '$database' variables is replaced with the database's filename (the 'value' column). I'd like it to be the 'name' column, but using "$database.name" throws an runtime exception: NotFound: cannot find 'name' while searching for 'database.name' (full backtrace is listed below). Is there a way to display the name instead of the value in an <option> clause ? Thanks! Gordon. URL: http://tango/tool_runner/index File 'build/bdist.solaris-2.11-i86pc/egg/weberror/evalexception/middleware.py', line 364 in respond File 'build/bdist.solaris-2.11-i86pc/egg/paste/debug/prints.py', line 98 in __call__ File 'build/bdist.solaris-2.11-i86pc/egg/paste/wsgilib.py', line 539 in intercept_output File 'build/bdist.solaris-2.11-i86pc/egg/beaker/session.py', line 103 in __call__ File 'build/bdist.solaris-2.11-i86pc/egg/paste/recursive.py', line 80 in __call__ File 'build/bdist.solaris-2.11-i86pc/egg/paste/httpexceptions.py', line 632 in __call__ File '/media/sdb1/galaxy/galaxy_devel/lib/galaxy/web/framework/base.py', line 126 in __call__ body = method( trans, **kwargs ) File '/media/sdb1/galaxy/galaxy_devel/lib/galaxy/web/controllers/tool_runner.py', line 45 in index template, vars = tool.handle_input( trans, params.__dict__ ) File '/media/sdb1/galaxy/galaxy_devel/lib/galaxy/tools/__init__.py', line 708 in handle_input out_data = self.execute( trans, incoming=params ) File '/media/sdb1/galaxy/galaxy_devel/lib/galaxy/tools/__init__.py', line 915 in execute return self.tool_action.execute( self, trans, incoming=incoming, set_output_hid=set_output_hid ) File '/media/sdb1/galaxy/galaxy_devel/lib/galaxy/tools/actions/__init__.py', line 159 in execute data.name = fill_template( output.label, context=params ) File '/media/sdb1/galaxy/galaxy_devel/lib/galaxy/util/template.py', line 9 in fill_template return str( Template( source=template_text, searchList=[context] ) ) File '<string>', line 28 in respond NotFound: cannot find 'name' while searching for 'database.name'

2 1

MAF alignments
by Stewart Stevens 18 Dec '08

18 Dec '08

Hi, I have a local Galaxy instance. (http://bioanalysis.otago.ac.nz/ galaxy) A problem I'm having is when I try to extract a multiple alignment I get an error, "The MAF source specified (28_WAY_MULTIZ_hg18) appears to be invalid.," The background is that I have downloaded the alignment from http://hgdownload.cse.ucsc.edu/goldenPath and modified the maf_index.loc file according to the commented example. I'm new to python but I did have a look in the debugger and it seems to be missing an index file. The parameters it has loaded from maf_index.loc appear to be as expected. I'm guessing with more looking at the code it will become clear that a script to produce these indexes from mafs needs to be run. Or should these indexes be being built on the fly? Maybe someone can save me a load more digging? Thanks if you can! Cheers, Stewart -- Stewart Stevens Software Developer - Integrated Genomics University of Otago Department of Biochemistry Tel: +64(0)3 479 7863 Fax: +64(0)3 479 7866

2 1

ask a question about MAF alignment file
by Anyuan Guo 11 Dec '08

11 Dec '08

2 1

Suggestion: new tag in tool's XML file (long)
by Assaf Gordon 09 Dec '08

09 Dec '08

Hello, As recently discussed on this mailing list, With long workflows it becomes somewhat difficult to make sense of the datasets names. The 'label' option in the <output> tag is good start, but there's still room for improvement. I propose a new 'tag' element which might help a little bit. Here's an example: I have five tools in a simple workflow: Starting with a FASTQ file, FASTQ-to-FASTA FASTA-clipper, FASTA-collapser, FASTA-filter-by-length, FASTA-low-complexity-repeats-remover. Each tool has an <output> option with a label. The label takes the description of the tool, and the name of the input. Example from the low-complexity-repeats-remove tool: <output> <data format="input" name="output" label="$input.name (without low-complexity repeats)" metadata_source="input" /> </output> Similarly, most of the other tools have a 'label' option in the XML file. The problem is that after running the workflow, the name of each dataset becomes longer and longer, and more cumbersome. Even if I start with a very short name for the initial FASTQ file (e.g. 'AG14'), The other datasets are named like: AG14 (Fasta), AG14 (Fasta) clipped, AG14 (Fasta) clipped collapsed, Filter sequences by length on data 5 Filter sequences by length on data 5 (without low complexity repeats) (The filter-sequences-by-length tool doesn't have a 'label' option with makes the output even worse.) My suggestion: I would like each data set to have a 'tag' - a short name which is carried over from one dataset to the next, without taking the entire dataset's name. This 'tag' is usually just the name of the initial dataset. going back to the previous example, if I use the initial library name as the tag (=AG14), the output of the workflow would look like: AG14 AG14 (Fasta) AG14 (Clipped) AG14 (Collapsed) AG14 (Filtered Sequences) AG14 (without low complexity repeats) I tried to add this functionality to the galaxy source code. I'm aware that the Right Thing to do is probably to add a new column to the relevant database table, and setup the tools to automatically take the tag from one dataset to the next. However, as an intermediate solution, my hack works nicely (I'm attaching pictures of 'before' and 'after :-) ). With this hack, one can use the following 'label' in the XML <output> tag: <outputs> <data format="input" name="output" label="$input.tag clipped" metadata_source="input" /> </outputs> The "$input.tag" extract the tag from the input's name, and puts in the square brackets, so that the next tool will also be able to extract the tag. Here's the added code ( ./lib/galaxy/model/__init__.py ): ... class HistoryDatasetAssociation( object ): ... @property def tag ( self ): #Hack by gordon: #add a '.tag' attribute tag_match = re.search( '\[([^[]+)\]', self.name); if tag_match: tag = "[" + tag_match.group(1) + "]" ; else: tag = "[" + self.name + "]" ; return tag Thanks for reading so far, Any comments and welcomed, Gordon.

1 0

Apache auth and sending data
by Sean Davis 04 Dec '08

04 Dec '08

I am running galaxy behind apache in proxy and using apache auth for login. Works great. However, when I want to send data to, for example, UCSC as a custom track, the request does not go through because the UCSC server cannot send the auth credentials. Has anyone worked around this problem? Thanks, Sean

2 2

Workflow improvement requests (long)
by Assaf Gordon 02 Dec '08

02 Dec '08

Dear all, Recently, users (of our local galaxy server) started using workflows, and are very pleased. However, as workflows get more complicated, it gets harder to track the input and output of the workflows. I'd like to share an example, to illustrate the problems that we encounter. The workflow (pictured in the attached 'workflow.jpg') takes 4 input datasets, and produces 4 output datasets. The first problem is that there's no way to differentiate between the input datasets (They appear simply as "Step 1: Input dataset", "Step 2: Input Dataset", etc). Since each dataset has a specific role, I've had to print the workflow and give the users instructions as to which dataset (in their history) goes into what dataset. (see attached 'crosstab_workflow_input_datasets.jpg'). The second problem is that whenever I change something in the workflow and save it - the order of the dataset change! So what was once dataset 1, can now be dataset 2,3 or 4. Users have no way of knowing this... (keen users might notice the the description of the first tool changed from "Output dataset 'output' from step 2" to "Output dataset' output' from step 4" - but this is very obscure...). The third problem is that once the workflow completes, the resulting dataset have cryptic names such as "Join two queries on Data 10 and Data 2". Since "Data 10" is "Awk on Data 8" and data-8 is "Generic Annotations on Data 7 and Data 1" and data-7 is "Intersect data 1 and data 6" - it gets a bit hard to know what's going on. (see attached 'crosstab_history.png'). For the meantime, I've simply gave written instructions on what each dataset means (see attached 'crosstab_workflow_dataset_explnanations.jpg). If I may suggest a feature - it would be great if I could name a dataset inside the workflow. Instead of naming it "Input dataset" I could give it a descriptive name, so even if the order of the input datasets changes, users will know which dataset goes into which input. Regarding the output dataset names, the 'label' option in the tools' XML is a good start, but still creates very long, hard-to-understand names. Another great feature would be the possibility to add an 'output label' for each step in the workflow. Regardless of the above, I'd like to say (once again) that Galaxy is a great tool, and workflows are really cool - we have several long workflows which do wonderful things. Thanks for reading so far, Gordon.

4 6

Moving and processing large files: will a local installation help?
by Johannes Waage 01 Dec '08

01 Dec '08

Hi all, I'm moving quite a lot of large files (>2 GB) between Galaxy and our local network for processing (mainly sequence retrieval and interval processing). This is a tad on the slow side; will a local installation of Galaxy help improve the speed? Or will the gain in transfer speed be lost in increased processing time (I do not plan on installing on a cluster)? Thankyou in advance, Johannes Waage, Uni. of Copenhagen, Denmark

2 1