September 2014 - galaxy-dev - lists.galaxyproject.org

Re: [galaxy-dev] Concept for a Galaxy Versioned Fasta Data Retrieval Tool
by Pedersen Edvard 05 Sep '14

05 Sep '14

My PhD work may be of interest for this subject, although the primary focus has been on generating databases comprising the changes from a specific timeframe, and was not designed specifically for Galaxy. The similarities between my system and the system you are proposing are that it can generate a BLAST database from any date (that has been added to the system), as well as "diffs" between two dates, and supports FASTA, the Uniprot EMBL variant, full files (which does not give compression benefits) and several others. The system uses delta compression to make sure that non-updated fields do not take up extra space. It uses the Hadoop stack (HBase, HDFS and MapReduce) for parallelism in generating the databases (the blast database generation from FASTA files is not parallel). You can find one of the publications here: <http://bdps.cs.uit.no/papers/hibb13.pdf> http://bdps.cs.uit.no/papers/hibb13.pdf I hope this can be of some use to you. Regards, Edvard Pedersen

1 0

Re: [galaxy-dev] Concept for a Galaxy Versioned Fasta Data Retrieval Tool
by Dooley, Damion 04 Sep '14

04 Sep '14

Earlier on in the project analysis I was pursuing a Git solution because it seemed all its features would work with documents/code/files of any kind and so would be perfect for scientific reproducibility. But its ability to efficiently archive non-documents is quite hit and miss, and the file size limitation becomes a major problem on top of that when it doesn't. I will try to design the system so that handlers for different types of databases/files can be called into play to retrieve versioned content. Its just that this fall I'll only have time to provide the handlers for fasta file archiving (the key-value database update approach enables fasta versioning and all the spinoff data from that.). The next priority would be a handler for any type of file that needs to be replaced as a whole from version to version (one just needs hard drive space to accommodate this, since caching is pointless). A git handler for well-behaved document content would also be a possibility. Typo: I said yesterday "I wasn't going to leave that as just "fasta" datatype since it seems tools like makeblastdb don't allow anything else ..." - but I meant 'I WAS going to leave that as just "fasta"...' d.

1 0

Examples of Galaxy tools in the toolsheds that install and run JAR files properly?
by Melissa Cline 04 Sep '14

04 Sep '14

Hi folks, I'm attempting something that should be straightforward, but it's not. I have a tool that runs a JAR file, which I have bundled with the tool. I simply want to run the JAR file. And to paraphrase Thomas Edison, I've tried several thousand things that do not work (at least for me), from setting the JAVA_JAR_PATH environment variable in the tool_dependencies.xml file to trying to copy the JAR file into the tool-data/shared/jars subdirectories (which is the closest thing I've got to working). So, at long last I'm doing the sensible thing and looking for one simple working example that I can use as a template. Who can suggest a good toolshed tool (either main or test) that involves running its own JAR file, and that works? Thanks! Melissa

5 6

Startup error after restoring Galaxy DB from backup
by Graeme Grimes 04 Sep '14

04 Sep '14

I have tried to restart Galaxy after restoring my database from a backup. Here is the error message I get in the log file. Any idea what is wrong and how to fix this problem? ------------------------------ galaxy.jobs DEBUG 2014-09-03 08:33:46,367 Loading job configuration from /export/users/galaxy/galaxy-test/universe_wsgi.ini galaxy.jobs DEBUG 2014-09-03 08:33:46,367 Done loading job configuration Traceback (most recent call last): File "/export/users/galaxy/galaxy-test/lib/galaxy/webapps/galaxy/buildapp.py", line 35, in app_factory app = UniverseApplication( global_conf = global_conf, **kwargs ) File "/export/users/galaxy/galaxy-test/lib/galaxy/app.py", line 102, in __init__ self.toolbox = tools.ToolBox( tool_configs, self.config.tool_path, self ) File "/export/users/galaxy/galaxy-test/lib/galaxy/tools/__init__.py", line 118, in __init__ self.load_integrated_tool_panel_keys() File "/export/users/galaxy/galaxy-test/lib/galaxy/tools/__init__.py", line 283, in load_integrated_tool_panel_keys tree = parse_xml( self.integrated_tool_panel_config ) File "/export/users/galaxy/galaxy-test/lib/galaxy/util/__init__.py", line 132, in parse_xml tree = ElementTree.parse(fname) File "/export/users/galaxy/galaxy-test/eggs/elementtree-1.2.6_20050316-py2.6.egg/elementtree/ElementTree.py", line 859, in parse tree.parse(source, parser) File "/export/users/galaxy/galaxy-test/eggs/elementtree-1.2.6_20050316-py2.6.egg/elementtree/ElementTree.py", line 583, in parse parser.feed(data) File "/export/users/galaxy/galaxy-test/eggs/elementtree-1.2.6_20050316-py2.6.egg/elementtree/ElementTree.py", line 1242, in feed self._parser.Parse(data, 0) ExpatError: not well-formed (invalid token): line 117, column 1 Removing PID file /var/run/paster.pid ----------------- Thanks, Graeme -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

2 2

Concept for a Galaxy Versioned Fasta Data Retrieval Tool
by Dooley, Damion 03 Sep '14

03 Sep '14

We are about to implement a fasta database (file) versioning system as a Galaxy tool. I wanted to get interested people's feedback first before we roll ahead with the prototype implementation. The versioning system aims to: * Enable reproducible research: To recreate a search result at a certain point in time we need versioning so that search and mapping tools can look at sequence reference databases corresponding to a particular past date. This recall can also explain the difference between what was known in the past vs. currently. * Reduce hard drive space. Some databases are too big to keep N copies around, e.g. 5 years of 16S, updated monthly, is say, 670Mb + 668Mb + 665Mb + .... But occasionally we want to access past archives fairly quickly. * Integrate database versioning into Galaxy without adding a lot of complexity. A bonus would be to enable the efficient sharing of version databases between computers/servers. The solution we think would work centres around a "Versioned Data Retrieval" tool (draft image attached) that would work as follows: 1) User selects from a list of databases provided by "Shared Data > Data Libraries > Versioned Data". - Each database has a master file that keeps its various versions as a list of time-stamped insert/delete transactions of key (fasta id) value (description & sequence) pairs. - Each master file is managed outside of galaxy via a triggered process on regular fasta file imports from data sources like NCBI or other niche sources. - We're expecting, due to the nature of fasta archived sequence updates, that our master file would only be about 1.1x the latest version in size (uncompressed). 2) User enters date / version id to retrieve (validated) 3) If a cached version of that database exists, it is linked into user's history. 4) Otherwise a new version of it is created, placed in cache, and linked into history. - The cached version itself then shows up as linked data under a Data Library > Versioned Data subfolder. 5) User can select preconfigured workflow(s) to execute on the selected retreived fasta file to regenerate any database products they need. - Workflow output data would also be cached in the same way the fasta data is - by linking the Galaxy Data Library to it. - Workflow execution will be skipped if end data already exists in cache. - Simple makeblastdb or bowtie-build commands, or more specific workflows that include dustmasker etc can be implemented. Does this sound attractive? We're hoping such a vision could handle Fasta databases from 12mb to e.g. 200Gb (probably requires makeblastdb in parallel at that scale). Preliminary work suggests this project is doable via the Galaxy API without galaxy customization - does that sound right?! Feedback really appreciated! Regards, Damion Dooley Hsiao lab, BC Public Health Microbiology & Reference Laboratory, BC Centre for Disease Control 655 West 12th Avenue, Vancouver, British Columbia, V5Z 4R4 Canada

5 8

Re: [galaxy-dev] Concept for a Galaxy Versioned Fasta Data Retrieval Tool
by Dooley, Damion 03 Sep '14

03 Sep '14

About the datatype. So you are thinking of a new datatype that applies to files that hold the versioned database contents (in this case a structured key-value fasta identifier/sequence pairs, right?) Then the fasta archive versioning tool would take only files of that datatype for input. That sounds good. I was just going to have one folder (and its subfolders) in the data library that hold all the versioned databases to choose from. So the versioned database tool would just populate its input list based on that subfolder tree. But ensuring that it lists only files of a certain datatype sounds beneficial. Output in any case would be a fasta file that other tools are already expecting; I wasn't going to leave that as just "fasta" datatype since it seems tools like makeblastdb don't allow anything else as input from user history. I'm hoping that a global (admin) API key can be used by the tool so that all users can get versioned data, but maybe that is a pipe dream. Sure I'd like to see old patches! d. ________________________________________ From: Bj?rn Gr?ning [bjoern.gruening(a)gmail.com] Sent: Saturday, August 23, 2014 12:17 AM To: Dooley, Damion; galaxy-dev(a)lists.bx.psu.edu Cc: Hsiao, William Subject: Re: [galaxy-dev] Concept for a Galaxy Versioned Fasta Data Retrieval Tool Hi Damion, the idea sounds fantastic! Can we go a step further and use a specific datatype that keeps entire fasta files versioned and the user can choose which version he wants to use, in any tool? Please have a look at my talk at GCC2012. Maybe you are interested in the (old) patches. I would be very interested to restart this old project.

1 0

"when else" in <conditional> ? RE: refresh_on_change : is this a valid attribute? Any other ideas/options??
by Lukasse, Pieter 03 Sep '14

03 Sep '14

So I need to refresh on change....I see that if I have a conditional item in my form, this causes a refresh of the page and a (re)evaluation of my dynamic_options methods....so I could misuse this "feature". However, it seems that when I have a <conditional> I must have a <when> entry for every item in my select box. There is no "when else" option? Thanks, Pieter From: galaxy-dev-bounces(a)lists.bx.psu.edu [mailto:galaxy-dev-bounces@lists.bx.psu.edu] On Behalf Of Lukasse, Pieter Sent: woensdag 27 augustus 2014 22:37 To: galaxy-dev(a)lists.bx.psu.edu Subject: [galaxy-dev] refresh_on_change : is this a valid attribute? Any other ideas/options?? Hi, I'm trying to get a wrapper from someone else working and I found this "refresh_on_change" attribute in his select boxes which are filled using the dynamic_options feature: <param name="col_type" type="select" label="Select column type" refresh_on_change="true" display="radio" dynamic_options='get_column_type(library_file)' help="" /> <param name="polarity" type="select" label="Select polarity" refresh_on_change="true" display="radio" dynamic_options='filter_column(library_file,col_type)' help="" /> ... When searching the documentation/wiki I do not find a reference to this, but it would be a nice option to have ;) Question: is there any way I can force a refresh when the user selects another option from such a select box. As you can see in the example above, this is needed because the next select box has its dynamic options built up by a function that takes the value from the previous select (col_type - highlighted above) as an input parameter. Currently this tool only works by showing each select in its own <page> , which is a deprecated option and prevents the tool from being used in a workflow... :( Thanks for your help! Best regards, Pieter Lukasse Wageningen UR, Plant Research International Department of Bioinformatics (Bioscience) Wageningen Campus, Building 107, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands T: +31-317481122; M: +31-628189540; skype: pieter.lukasse.wur http://www.pri.wur.nl<http://www.pri.wur.nl/>

3 6

directory as an input file
by Philippe Moncuquet 03 Sep '14

03 Sep '14

Hi, I am trying to write a wrapper for a tool that take a directory containing SAM/BAM files as an input. I am not sure how to do that, is there another tool that implements this and that I can have a look at ? Any suggestions would be greatly appreciated. Regards, Philip

3 2

millions for rows in galaxy_session
by Robert Baertsch 02 Sep '14

02 Sep '14

Is there anyway to have galaxy not insert a row in the galaxy_session file for failed logins. Every ping from china seems to generate a row in galaxy_session. We have anonymous login turned off Robert

1 0

Re: [galaxy-dev] error with multi dataset tool run
by Robert Baertsch 02 Sep '14

02 Sep '14

> yes, I was the only user but I guess it is too fragile for this kind of thing. postgres wasn’t queuing age jobs correctly so I guess I should track down why. > > From: Hans-Rudolf Hotz <hrh(a)fmi.ch> > Subject: Re: [galaxy-dev] error with multi dataset tool run > Date: September 1, 2014 at 5:54:10 AM PDT > To: Robert Baertsch <baertsch(a)soe.ucsc.edu>, <galaxy-dev(a)lists.bx.psu.edu> > > > Hi Robert > > Are you using the built in SQLite database ? > > Hans-Rudolf > > On 08/31/2014 01:27 AM, Robert Baertsch wrote: >> I submitted 13 fastq files to tophat2 using DRMAA and got this error. >> Is it fatal? BTW: This is a super cool feature. >> >> I’m running the following version of galaxy-dist. >> >> changeset: 14212:91547729ffde >> branch: stable >> tag: tip >> user: Nate Coraor <nate(a)bx.psu.edu <mailto:nate@bx.psu.edu>> >> date: Fri Aug 29 14:00:23 2014 -0400 >> summary: Update tag latest_2014.08.11 for changeset ea12550fbc34 >> >> -Robert >> >> >> There were errors setting up 2 submitted job(s): >> >> * *Error executing tool: (OperationalError) database is locked >> u'UPDATE history_dataset_association SET update_time=?, name=?, >> blurb=? WHERE history_dataset_association.id = ?' ('2014-08-30 >> 23:14:44.683957', 'Tophat2 on data 7: insertions', 'queued', 137)* >> * *Error executing tool: (OperationalError) database is locked >> u'UPDATE dataset SET update_time=?, state=? WHERE dataset.id = ?' >> ('2014-08-30 23:15:57.204718', 'queued', 446)* >> >> >> >> >> ___________________________________________________________ >> Please keep all replies on the list by using "reply all" >> in your mail client. To manage your subscriptions to this >> and other Galaxy lists, please use the interface at: >> http://lists.bx.psu.edu/ >> >> To search Galaxy mailing lists use the unified search at: >> http://galaxyproject.org/search/mailinglists/ >> > > > > > From: Sandra Derozier <sandra.derozier(a)jouy.inra.fr> > Subject: [galaxy-dev] Error with functional tests on cluster > Date: September 1, 2014 at 6:54:41 AM PDT > To: galaxy-dev(a)bx.psu.edu > > > Hi all, > > I try to set up functional tests on different tools on my Galaxy portal. > > When I run functional tests locally everything works fine. But when I run them on the cluster it failed with this message : > > /bin/sh: module: line 1: syntax error: unexpected end of file > /bin/sh: error importing function definition for `module' > > The execution on the cluster is ok but the dataset state is set to ERROR. > > The DIFF between the expected result and the obtained result is null. Indeed, this two files are the same. > > As I do not know what the problem is: do you have an idea? > > Thanks, > Sandra DEROZIER > > Sandra DEROZIER > Unité Mathèmatique, Informatique et Génome (MIG) > Plateforme MIGALE > Bâtiment 233 > Domaine de Vilvert > 78352 Jouy-en-Josas Cedex > > > > From: Hans-Rudolf Hotz <hrh(a)fmi.ch> > Subject: [galaxy-dev] Solved - Re: testing the visualization plugins > Date: September 1, 2014 at 8:13:30 AM PDT > To: "<galaxy-dev(a)bx.psu.edu>" <galaxy-dev(a)bx.psu.edu> > > > Hi all > > First of all, a big Thanks to Carl who helped me fixing this problem. > > So as a summary for all, the problem was caused by a datatype (an extension to tabular), I manually added to "datatypes_conf.xml" > > Removing the datatype fixed the problem. I couldn't identify a syntax problem, neither in "datatypes_conf.xml" nor in "~/lib/galaxy/datatypes/registry.py" and "~/lib/galaxy/datatypes/tabular.py". However, renaming it (in all three files) fixed it as well. > > Regards, Hans-Rudolf > > > > _______________________________________________ > galaxy-dev mailing list > galaxy-dev(a)lists.bx.psu.edu > http://lists.bx.psu.edu/listinfo/galaxy-dev > > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/mailinglists/

1 0