January 2011 - galaxy-dev - lists.galaxyproject.org

Fwd: System built on Galaxy
by Marina Gourtovaia 13 Jan '11

13 Jan '11

FYI -------- Original Message -------- Subject: System built on Galaxy Date: Wed, 12 Jan 2011 13:29:11 +0000 From: Andy Brown <setitesuk(a)gmail.com> To: David Jackson <dj3(a)sanger.ac.uk>, Marina Gourtovaia <mg8(a)sanger.ac.uk>, Guoying Qi <gq1(a)sanger.ac.uk>, svvd(a)sanger.ac.uk, John O'Brien <jo3(a)sanger.ac.uk> Just saw this web post about a system built on top of galaxy. It does the lot from sample/project creation through tracking, launching the analysis pipeline and returning results. http://tinyurl.com/65o9o9j Looks interesting. Andy -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

1 0

Easy download/setup of Bacterial Genbank and multiple alignment files from UCSC?
by Paszkiewicz, Konrad 12 Jan '11

12 Jan '11

Happy new year to you! First of all, well done on a fabulous system. It really is going to make my life as a bioinformatician a lot easier and hopefully empower my wet-lab biologists. I'm trying to setup the 'Get microbial data' tool. Is there an easy way to get hold of these datasets easily with the location files and tool configuration files preset. I wouldn't like to have to try to setup thousands of genomes manually. Better still, can I set up my galaxy instance to send the request to your server directly? Thanks! Konrad. Dr Konrad Paszkiewicz Exeter Sequencing Service, Biosciences, Stocker Road, University of Exeter, Exeter EX4 4QD, UK. http://biosciences.exeter.ac.uk/facilities/sequencing/

2 4

Fetch closest feature
by SHAUN WEBB 12 Jan '11

12 Jan '11

Hi, I remember a post a while ago about the fetch closest feature tool requesting that the output returns overlapping features rather than just the nearest up/downstream feature. Was this ever implemented? Perhaps as a different tool? Would the easiest alternative be to create a workflow from existing tools (join-> filter out intervals that have a join -> fetch closest feature -> concatenate output files)? Thanks for your help Shaun Webb -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

2 1

[patch] Job Runner Eggs
by John Chilton 12 Jan '11

12 Jan '11

Hello, Currently lib/galaxy/config.py requires that custom job runners be paired with exactly one egg for fetching, this seems less than ideal since runners may need 0 eggs (as is the case for a runner I am implementing) or multiple eggs. Attached is a minor patch that addresses this. Please consider it for addition to galaxy-central. Thanks for your time, John ------------------------------------------------ John Chilton Software Developer University of Minnesota Supercomputing Institute Office: 612-625-0917 Cell: 612-226-9223 E-Mail: chilton(a)msi.umn.edu

2 1

remote user environment variable?
by David Hoover 11 Jan '11

11 Jan '11

I'm trying to set up a mechanism to copy a local file using a setuid executable, based on whether the registered user has read access to the local file in question. Is there any means for a tool to know the registered user who is running it? David Hoover Helix Systems Staff http://helix.nih.gov

2 1

Recommendation for Grid/Cluster management ?
by Assaf Gordon 11 Jan '11

11 Jan '11

Slightly off-topic, but for a new Galaxy installation - What would you recommend for a free (FOSS) Grid/Cluster management product ? SGE is not free any more (Oracle Grid Engine is "90-days evaluation, binaries only" free, starting 6.2u6). Ease of administration is top-priority, more than sophisticated scheduling policies (and of course it has to be well supported by Galaxy). Any suggestions ? -gordon

6 5

Galaxy (and GMOD) at PAG 2011, Jan 15-19
by Dave Clements 09 Jan '11

09 Jan '11

Hello all, Below is an announcement that went out to the larger GMOD community about GMOD related content at Plant and Animal Genome in a week. Some particular highlights for the Galaxy Community are * Dan Blankenberg's hour long seminar on Galaxy on Wednesday, * Anne-Francoise Adam-Blondon's talk on the GnpIS portal * and the INRA posters on using Galaxy for SNP detection (#188) and using Galaxy in the GnpIS portal (#805). See http://gmod.org/wiki/PAG_2011 for details and links. Also, if you are presenting work that uses Galaxy at PAG (or any other conference), please feel free to post that to the list as well. Thanks, Dave C ---------- Forwarded message ---------- From: Dave Clements <clements(a)nescent.org> Date: Sat, Jan 8, 2011 at 9:32 PM Subject: [Gmod-announce] GMOD at PAG 2011, Jan 15-19 To: GMOD Announcements List <gmod-announce(a)lists.sourceforge.net>, GMOD Comparative Genomics and Phylogeny List <gmod-cogephy(a)lists.sourceforge.net> Hello all, GMOD will have a stronger presence then ever at the Plant and Animal Genome Conference (PAG 2011) <http://gmod.org/wiki/PAG_2011>, being held January 15-19 in San Diego. This includes, for the first time, *an all day GMOD Workshop <http://www.intl-pag.org/19/19-gmod.html>* on Wednesday, January 19th, featuring extended talks on - MAKER <http://gmod.org/wiki/MAKER>: An easy to use genome annotation pipeline <http://www.intl-pag.org/19/19-gmod.html#MAKER> - Galaxy <http://gmod.org/wiki/Galaxy>: Analyze, Visualize, Communicate<http://www.intl-pag.org/19/19-gmod.html#Galaxy> - JBrowse <http://gmod.org/wiki/JBrowse>: A Next Generation Genome Browser <http://www.intl-pag.org/19/19-gmod.html#JBrowse> - Tripal <http://gmod.org/wiki/Tripal>: A Construction Toolkit for Online Genomic Databases <http://www.intl-pag.org/19/19-gmod.html#Tripal> - GMODviews <http://gmod.org/wiki/GMODviews>: A web-based knowledge-base and user-configurable interface for GMOD databases and tools, including Tripal, GBrowse, and Chado DBs<http://www.intl-pag.org/19/19-gmod.html#GMODviews> - GBrowse_syn <http://gmod.org/wiki/GBrowse_syn>: The Generic Synteny Browser <http://www.intl-pag.org/19/19-gmod.html#GBrowse_syn> PAG 2011 <http://gmod.org/wiki/PAG_2011> will also feature workshops and tutorials covering - BioMart <http://gmod.org/wiki/BioMart>: Ensembl and Ensembl Genomes<http://www.intl-pag.org/19/abstracts/W29_PAGXIX_188.html> - GBrowse <http://gmod.org/wiki/GBrowse>: The Generic Genome Browser: A Hands on Workshop for Installing, Configuring and Using Your Own GBrowse<http://www.intl-pag.org/19/19-gbrowse.html> - CMap <http://gmod.org/wiki/CMap>: Using Gramene: A Genomics and Genetics Resource for Rice and other Grasses<http://www.intl-pag.org/19/19-gramene.html> Plus *at least 25 more talks<http://gmod.org/wiki/PAG_2011#Projects_Using_GMOD_Components>and posters <http://gmod.org/wiki/PAG_2011#Posters>* about or using Pathway Tools <http://www.intl-pag.org/19/abstracts/C02_PAGXIX_903.html>, GBrowse<http://www.intl-pag.org/19/abstracts/P08b_PAGXIX_828.html>, Galaxy <http://gmod.org/wiki/Galaxy>, CMap <http://gmod.org/wiki/Cmap>, Apollo <http://gmod.org/wiki/Apollo>, Chado <http://gmod.org/wiki/Chado>, GBrowse_syn <http://gmod.org/wiki/GBrowse_syn>, JBrowse<http://gmod.org/wiki/JBrowse>, BioMart <http://gmod.org/wiki/BioMart>, MAKER <http://gmod.org/wiki/MAKER>, Tripal <http://gmod.org/wiki/Tripal>, and the GMOD Evo Hackathon<http://www.intl-pag.org/19/abstracts/P08a_PAGXIX_814.html>. See http://gmod.org/wiki/PAG_2011 for a more complete list. Many GMOD users<http://gmod.org/wiki/PAG_2011#Projects_Using_GMOD_Components>and GMOD developers, including Stephen Ficklin <http://gmod.org/wiki/User:Sficklin> (Tripal<http://gmod.org/wiki/Tripal>), Carson Holt <http://gmod.org/wiki/User:Carsonholt> (MAKER<http://gmod.org/wiki/MAKER>), Barry Moore <http://gmod.org/wiki/User:Bmoore> (SOBA<http://gmod.org/wiki/SOBA>, MAKER <http://gmod.org/wiki/MAKER>), Dan Blankenberg<http://gmod.org/wiki/User:DanB>( Galaxy <http://gmod.org/wiki/Galaxy>), Mitch Skinner<http://gmod.org/wiki/User:MitchSkinner>( JBrowse <http://gmod.org/wiki/JBrowse>), Ken Youens-Clark (CMap<http://gmod.org/wiki/CMap>), Mike Caudy <http://gmod.org/wiki/User:Mcaudy> (GMODviews<http://gmod.org/wiki/GMODviews>), Rob Buels <http://gmod.org/wiki/User:RobertBuels> (Bio::Chado::Schema<http://gmod.org/wiki/Bio::Chado::Schema>, GBrowse_syn <http://gmod.org/wiki/GBrowse_syn>), and Scott Cain<http://gmod.org/wiki/User:Scott>( Chado <http://gmod.org/wiki/Chado>, GBrowse <http://gmod.org/wiki/GBrowse>, ...) will be attending PAG this year. Please seek each other out. Dave Clements <http://gmod.org/wiki/User:Clements> Galaxy Project <http://gmod.org/wiki/Galaxy> PS: In addition, three of the GMOD developers (Dan, Mitch, Scott) are also available to visit organizations in the San Diego area<http://gmod.org/wiki/GMOD_News#GMOD_Roadshow_in_San_Diego>during the meeting. -- http://gmod.org/wiki/GMOD_Americas_2011 http://gmod.org/wiki/GMOD_News http://usegalaxy.org/ ------------------------------------------------------------------------------ Gaining the trust of online customers is vital for the success of any company that requires sensitive data to be transmitted over the Web. Learn how to best implement a security strategy that keeps consumers' information secure and instills the confidence they need to proceed with transactions. http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Gmod-announce mailing list Gmod-announce(a)lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gmod-announce -- http://usegalaxy.org/

1 0

bowtie map to BED conversions
by Ry4an Brase 06 Jan '11

06 Jan '11

I've got a user request for a converter from bowtie's map output to BED format, and looking at the provided script it's mostly just an application of cut(1) and sort(1). Is this something Galaxy already does through some mechanism we're not finding or is this 3 line conversion script something I should be adding and submitting back? Thanks, -- Ry4an Brase 612-626-6575 Software Developer Application Development University of Minnesota Supercomputing Institute http://www.msi.umn.edu

2 1

Re: [galaxy-dev] [galaxy-user] parallel processing of multiple files on a multi-processor box?
by Peter 06 Jan '11

06 Jan '11

On Thu, Jan 6, 2011 at 7:52 AM, Bossers, Alex <Alex.Bossers(a)wur.nl> wrote: > Yury, > > If the software has this option its no problem to use them! > Have a look at Peter's last blast+ wrappers. Blast+ of ncbi has > the ability to specify a number of cores to use...and so can you > by configuring it in a tool config. Regarding the parallelisation.. > no expert in this. Have a look in the tool shed for the signalp > and TMHMM wrappers. There you find a piece of python to > split large jobs in batches, process them in parallel and merge > them back. > > No experience with cluster or grid jobs myself... > > Alex Hi Yury (& Alex), For a little clarification, like many computationally intensive command line tools the NCBI BLAST+ tools have a switch for the number of processors. Currently (like most of the other Galaxy wrappers) this is specified in the XML wrappers, in this case hard coded at 8. Some of the other tools XML files are hard coded with 4 threads (e.g. bwa). In the case of TMHMM and SignalP, the tools themselves are single threaded but I wrote a wrapper script (in Python) which divides the input FASTA file into chunks and runs multiple instances of the tool and then collates the output. Again, my wrapper tools is told how many threads to use via the XML wrapper. You can find my Galaxy wrappers for TMHMM and SignalP here at the "Galaxy Community Tool Shed" (Alex has been testing them - thanks!): http://community.g2.bx.psu.edu/ Some of the provided Galaxy wrappers have a note in the XML saying the number of threads should be configurable, perhaps via a loc file. I have suggested to the Galaxy developers there should be a general setting for number of threads per tool accessible via the XML, so that this can be configured centrally (maybe I should file an enhancement issue for this): http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-September/003393.html http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-October/003407.html http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-October/003408.html (I've CC'd the galaxy-dev list, since this discussion is heading in that direction) Peter

3 2

FASTA filtering by ID
by Peter 06 Jan '11

06 Jan '11

Hi all, Something I want to do in several of my workflows is to filter a FASTA file (or potentially other format sequence files) using a list of desired identifiers (e.g. a column from a tabular file). Right now I can achieve this with three steps in Galaxy. Suppose I have: Dataset #1, FASTA file Dataset #2, Tabular file with identifiers of interest (e.g. BLAST hits, or filtered output from a sequence analysis tool) Then: Create tabular Dataset #3 using FASTA-to-tabular on Dataset #1, subject to the enhancement proposed here: http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-November/003717.html Create tabular Dataset #4 using join on Datasets #2 and #3 using the matched identifier columns. This does the filtering. Create FASTA Dataset #5 using tabular-to-FASTA on Dataset #4. This works (at least for reasonably sized datasets), but requires three steps and the creation of at least two temporary files. I'd like to introduce another tool under "FASTA manipulation" to do it on one step (rather than three). Am I going against the apparent Galaxy ideal that complex manipulations should be done with tabular files? Would such a FASTA filter tool be of interest to add directly to Galaxy (e.g. under the "FASTA manipulation" section), or better off on the community tool shed? Here is my current implementation for discussion/consideration: http://bitbucket.org/peterjc/galaxy-central/changeset/730b89c4da26 Thanks, Peter

2 6