July 2010 - galaxy-dev - lists.galaxyproject.org

Collaborating on Galaxy - sharing and re-sharing histories
by Assaf Gordon 14 Jul '10

14 Jul '10

Hi all, I recently had to work closely with someone with our galaxy server, sharing histories back-and-forth with them. Each time they would run couple of jobs and share the history with me, and I'll check their results, perhaps run couple of jobs of my own and re-share the new results with them. The problem is that collaborating like that is quite annoying. Besides the need to go through so many clicks (of publish/share and list-histories-shared-with-me/clone/switch), another problem is that my histories list looks like this: === Clone of 'Clone of 'Clone of 'XXXX, BLAT' shared by 'gordon(a)cshl.edu' (active items only)' shared by 'xxxxxx(a)cshl.edu' (active items only)' shared by 'gordon(a)cshl.edu' (active items only) === Clone of 'Clone of 'xxxxxx, BLAT' shared by 'gordon(a)cshl.edu' (active items only)' shared by 'xxxxxxx(a)cshl.edu' (active items only) === Clone of 'xxxxxx, BLAT' shared by 'gordon(a)cshl.edu' (active items only) === and so on... each history adds just one or two more useful datasets, and it becomes cumbersome to manage all of them (not to mention that my histories-list and shared-histories-list are littered with the same history over and over). I think that in order to take galaxy to the 'next-level' of collaborative frame-work, a truly shared history mechanism will be useful: A single history instance, that once shared, every change made to it (by any user) will immediately appear on all users sharing this history. conceptually similar to http://piratepad.net/ (previously Etherpad), where all participants immediately see all the changes made to the content. Thanks, -gordon

1 0

XML definition
by Sébastien HARISPE 14 Jul '10

14 Jul '10

Hi, I am trying to integrate some tools into Galaxy but I have encountered some difficulties in attempting to understand how the XML tool definition works exactly ... despite the XML tag documentation available at http://bitbucket.org/galaxy/galaxy-central/wiki/ToolConfigSyntax . Imagine a case where the presence of certain arguments depends on: - its value e.g we don't want to include it if its value is undefined - other arguments' values e.g if the value of x is greater than the y value we don't want to include the x argument on the command line How can I specify it using the XML definition? A simplified case: A tool needs arguments A and B or an input file F We want to propose two modes to configure it - [1] a basic mode where we can graphically set A and B values using fields such as text value... - [2] an advanced mode where we can specify a file containing complex configuration to upload tool command line: for [1] tool.py -A arg1 -B arg2 [2] tool.py -F confile I don't want to include the -F command line argument if -A and -B are defined I currently manage these cases using wrappers...quite boring Where can we find: - advanced documentation for XML definition - commented advanced examples - related discussions Best regards Seb [sorry for the bad english]

2 4

Re: [galaxy-dev] [galaxy-user] microbes data in local instance of galaxy
by Daniel Blankenberg 13 Jul '10

13 Jul '10

Hi Alex, This data was obtained using scripts that are found under $GALAXY_ROOT/scripts/microbes/ and there is a README.txt file available here as well. However, it has been some time since these scripts have been used and it is possible that they have become stale and would require some tweaking to get working properly (IIRC there was some messy webpage scrapping involved). Thanks for using Galaxy, Dan On Jul 13, 2010, at 2:30 AM, Bossers, Alex wrote: > Hi All, > We have or local galaxy instance running which works fine. > In the get data section the Microbes tool has no local ncbi data. The public instance has it. > What is the best/easiest way to get that data into our local instance of galaxy. Have been browsing the wikis and looked through library and dataset documentations but was unable to resolve this at first glance. > Any help/guidance appreciated. > Thanks > Alex > _______________________________________________ > galaxy-user mailing list > galaxy-user(a)lists.bx.psu.edu > http://lists.bx.psu.edu/listinfo/galaxy-user

1 0

Workflow copy history item
by Dennis Gascoigne 13 Jul '10

13 Jul '10

The copy history item is available from the Edit Attributes page for a history item. It would be more useful and a bit more logical if this option was available from the history drop down menu on the right and the history menu in the saved histories page. The reason/advantage to this is; * The history item copy page offers the capacity to copy one or more items from the history, it is not limited to the history item you click to the page from * It is much easier to * copy items between histories by selecting the option from the history menu, or from the history menus in the saved histories page, VS * opening a history, clicking on an edit attribute, scroll to the bottom of the page, click copy history item My apologies if this seems picky but I do this a lot. Cheers Dennis

1 0

History link in main menu
by Dennis Gascoigne 13 Jul '10

13 Jul '10

This might seem minor, but a link to History manager in the main menu at the top (between Analyze and Workflow) would be really handy. It is much more common a destination than workflows but is an extra click/select away through the workflow menu on the right. A couple of users in our group have bought this up, and I kind of agree. Cheers Dennis

1 0

Workflow Error
by Sumedha Ganjoo 12 Jul '10

12 Jul '10

Hi, I am using a tool that interacts with Web services via Galaxy. Run on its own in the Galaxy GUI, it works fine. But when we run it in a workflow, with other tools it sometimes runs correctly , sometimes gives the following error: OperationalError: (OperationalError) database is locked u'UPDATE job SET update_time=?, command_line=? WHERE job.id = ?' ['2010-07-12 14:10:18.398960', 'python ${installationDirectory}/tools/${............ ..................................................... I was wondering if Galaxy has some timeouts implemented in their workflows because of which if the first step takes a while the second step of the workflow is executed simultaneously? Or any other explanation for such behavior. I would really appreciate a reply as soon as possible. Thanks. Regards, Sumedha -- Sumedha Ganjoo Graduate Assistant, Department Of Computer Science, University Of Georgia, Athens, GA , USA

2 1

TopHat and other tools with too many options
by Assaf Gordon 12 Jul '10

12 Jul '10

Hi all, I'm in the process of adapting TopHat to our needs, and there are just to many options... It's OK if you run it from the command line, but in Galaxy it looks like a big mess. Similar to Bowtie's tool situation, the "common" option are not specific enough (to our needs) and the "full options" mode is too hard to use (which will result in users not using it at all, or using it wrong). I'd like to request/propose a change in the way the GUI is rendered based on the XML tool. Mainly, to create logical parameter "groups": parameters which logically go together, and are related to one another. In the XML file, it could look like: <inputs> <group name="Introns" help="These settings control the intron sensitivity."> <param name="min_intron_length" type="integer" value="70" /> <param name="max_intron_length" type="integer" value="500000" /> </group> <group name="Quality" help="XXXXXX"> <param name="max_multihits" type="integer" value="40" label="Maximum number of alignments to be allowed" /> <param name="junction_filter" type="float" value="0.15" label="Minimum isoform fraction: /> </group> ... </inputs> And in the HTML output, the groups will be visually distinct, with some nice hide/expand javascript trick, see (fake) example here: http://cancan.cshl.edu/labmembers/gordon/files/galaxy_advanced_options.html IMHO, there are couple of advantages in this layout: 1. The "big-picture" of the available settings is immediately visitble (e.g. "introns", "quality", "segments" etc.). 2. Parameters are separated into logical groups, easier to understand what's being changed (as opposed to one very long cryptic list of parameters). 3. Advanced vs. Simple options are clearly marked 4. Since parameter groups can be hidden, when they are expanded they can contain a help paragraph - this is much easier for the user than scrolling up/down to see the help section below (also - the relevant help section now appears right next to the parameters). I guess this is not a trivial change, but without it it will get harder and harder to integrate complex tools. Comments are welcomed, -gordon

2 1

Cleanup hidden datasets cleanup script
by Dennis Gascoigne 11 Jul '10

11 Jul '10

Hi guys; We have taken advantage of your recent excellent additions to the workflow and have a lot of histories where intermediate datasets are hidden. It is becoming quite critical to us to catch these hidden history items in the cleanup scripts and delete them as they consume a lot of room and we cannot identify another way of removing them. The obvious place to remove hidden datasets appears to be with an option in the cleanup procedures -In previous discussions you have mentioned that this is something you propose doing and it would at first seem relatively quick. What is your timing on this, or has it already been done. If it is a while away, is there anything I should consider in writing/amedning a query to roll our own? Cheers Dennis

2 3

Re: [galaxy-dev] [galaxy-bugs] help about compress file
by Jeremy Goecks 09 Jul '10

09 Jul '10

Eric, The galaxy-dev mailing list (cc'd) is a good place to ask your question. This isn't my area of expertise, so hopefully someone else can chime in and help you out. J. On Jul 9, 2010, at 11:12 AM, Eric Aguiar wrote: > Jeremy, > > I followed all steps in the galaxy tutorial, but I didn't have success. > I'm trying to create a datatype for megabase chromatograms (.esd) very similar to the Ab1 ones. > > Here is my configurations. > > > 1 - Creating datatypes in datatypes_conf.xml > > <datatype extension="zip" type="galaxy.datatypes.binary:Esd" mimetype="application/zip" display_in_upload="true"/> > <datatype extension="esd" type="galaxy.datatypes.binary:Esd" mimetype="application/octet-stream" display_in_upload="true"/> > > 2 - Defining types in lib/galaxy/datatypes/binary.py > > class Esd( Binary ): > """Class describing an ab1 binary sequence file""" > file_ext = "esd" > > def set_peek( self, dataset, is_multi_byte=False ): > if not dataset.dataset.purged: > dataset.peek = "Binary chromatograms sequence file" > dataset.blurb = data.nice_size( dataset.get_size() ) > else: > dataset.peek = 'file does not exist' > dataset.blurb = 'file purged from disk' > def display_peek( self, dataset ): > try: > return dataset.peek > except: > return "Binary esd sequence file (%s)" % ( data.nice_size( dataset.get_size() ) ) > > > class Zip( Binary ): > """Class describing a zip archive of binary sequence files""" > file_ext = "zip" > > def set_peek( self, dataset, is_multi_byte=False ): > if not dataset.dataset.purged: > zip_file = zipfile.ZipFile( dataset.file_name, "r" ) > num_files = len( zip_file.namelist() ) > dataset.peek = "Archive of %s binary sequence files" % ( str( num_files ) ) > dataset.blurb = data.nice_size( dataset.get_size() ) > else: > dataset.peek = 'file does not exist' > dataset.blurb = 'file purged from disk' > def display_peek( self, dataset ): > try: > return dataset.peek > except: > return "Binary sequence file archive (%s)" % ( data.nice_size( dataset.get_size() ) ) > def get_mime( self ): > """Returns the mime type of the datatype""" > return 'application/zip' > > > When I'm going to send the file in zip format (.esd files compressed),it shows me the following error: > "An error occurred running this job: Invalid 'File Format' for archive consisting of binary files - use 'Binseq.zip'" > > I tried somethings, but I don't have success. > > Thank you, > > On 07/08/2010 05:50 PM, Jeremy Goecks wrote: >> >>> I would like to know about the use of compressed files (zip format) in get data app. I'm trying to send .esd files compressed but the program shows me the following error message: "The uploaded file contains inappropriate content". >> >> >> Hi Eric, >> >> Galaxy can accept zip files with a single compressed file but does not recognize .esd files. You'll need to convert the esd file into a format that Galaxy recognizes or run your own Galaxy instance and write your own datatype for Galaxy: >> >> http://bitbucket.org/galaxy/galaxy-central/wiki/AddingDatatypes >> >> Thanks, >> J. > > > -- > <eric_vcard.png>

1 0

Adding job
by Filip Balejko 08 Jul '10

08 Jul '10

Hi, I'm working on the project for Google Summer of Code: https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code… Demo instance can be found here: http://137.110.191.252:8080/ (under the "Phylogenetics" section) As you can see, detailed results of the SLAC analysis are generated on demand. In this version computation is done when the request is handled (in datamonkey controller). It bypasses the job queue and I guess it might not be considered as the best approach. I was thinking about creating jobs for those on-demand computations. What do you think about this solution? I understand that to accomplish this, I have to create another tool and put my code there. Is it the only way to use the job queue? Is it possible to add a job which isn't displayed in the history box? best regards, Filip Balejko

1 0