Is there a way in galaxy to retrieve the promoter sequences for a list
of genes. I tried using UCSC genome browser, but in many cases it keeps
giving more than one promoter sequence per gene.
I filed an enhancement report since if the workflow conditional facility
does not appear to exist in Galaxy:
-------- Original Message --------
Subject: Workflows with conditional statements
Date: Wed, 18 May 2011 10:31:21 +1000
From: Florent Angly <florent.angly(a)gmail.com>
To: galaxy-user(a)lists.bx.psu.edu <galaxy-user(a)lists.bx.psu.edu>
I was wondering if there is a way to put conditional statements in a
This would be useful, for example, in the case of a workflow that has an
optional advanced option that the user can click. This advanced option
would add some extra steps to the data processing.
Another example of how this could be useful is if inside a workflow, the
data needs to be processed differently based on the results of previous
workflow steps. Say, you have a worflow that takes some sequences, and
calculate their average length. Using a conditional statement, the
workflow would put the data through DeBruijn assembler if the reads are
small, but through a traditional Overlap-Layout-Consensus assembler if
the reads are long.
Are conditional statements possible in Galaxy workflows and I just don't
know how to use them?
A new version of CloudMan has just been released. The two most prominent
changes in this release include:
1. Support for automated persistance of modifications to the underlying file
2. Support for CloudBioLinux clusters on Ubuntu 11.04
1. CloudMan has supported customization of individual instances from its
initial release but the most recent changes automates the process. This
advance continues to promote use of CloudMan as a platform; you can use
CloudMan with all of the infrastructure management capabilities as well as
installed and configured Galaxy to customize your own instance by adding
your own tools or tools that are not part of the default configuration.
Complete instructions on how to use this feature are now available on
2. CloudBioLinux offers an AMI on AWS with a pre-configured BioLinux and now
CloudMan as well. In the past, CloudMan's AMI has been built on top of the
CloudBioLinux and thus offered all of the features of CloudBioLinux. As of
recently though, CloudBioLinux AMI build process itself includes CloudMan.
As a result, there is now an AMI (ami-dfc502b6) with CloudBiolinux,
CloudMan, and Galaxy available on Ubuntu 11.04. Details about CloudBioLinux
are available at http://cloudbiolinux.org/.
(Warning: Galaxy newbie here.)
I'd like to run a series of galaxy tools repeatedly on multiple input
files. In other words, I have input data files [data1, data2, ...,
dataN] and I want to apply [step1, step2] to each of these files to
produce [output1, output2, ..., outputN]. I know that I can automate the
application of the multiple steps to a single input file as a workflow,
but is there any way to automate running a workflow multiple times on
multiple inputs through the Galaxy web interface? (I see the scripts in
scripts/api, but I'm looking for something entirely driven by the web
For a second question, is there a recommended way of combining then
merging the results (e.g. from multiple workflows) into a single file?
The use case for this is that we're doing experiments on lots of
individual flies from several genotypes and we want to analyze data from
our individual fly experiments but then pool results across flies from
the same genotype and compare with other genotypes. I imagine this is
pretty similar to what a lot of people are doing, but somehow I'm not
getting how this might be done with Galaxy.
Andrew D. Straw, Ph.D.
Research Institute of Molecular Pathology
the cloud version is using Linux ip-10-68-42-15 2.6.32-308-ec2 #16-Ubuntu SMP Thu Sep 16 15:25:39 UTC 2010 x86_64 GNU/Linux
what is the latest Ubuntu version we can use for a full unified install using the tool and data scripts to populate both tools and indexes?
I'm not sure if this is the place to ask this, but if so - here goes. If I have a list of genomic regions (from CNV gains and losses) comprised of chromosome, start and stop (ie. chr7 68000000 71000000) for a given genome build (HG 18), and I want to add the genes (ideally HUGO gene Symbols or refseqIDs)that reside within each region per line.
So I want to input something like this:
JC 507 CD19
JC 507 CD19
JC 507 CD19
And get an output similar to this:
JC 507 CD19
CDC123, DHTKD1, NUDT5, SEC61A2, UPF2
JC 507 CD19
AGAP9, ANXA8, ANXA8L1, CTSL1P2, FAM25B, FAM25C, FAM25G, GDF10, GDF2, LOC642826, RBP3, ZNF488
JC 507 CD19
ATOH7, DNA2, HNRNPH3, MYPN, PBLD, RUFY2, SLC25A16
Application Scientist - Laboratory for Advanced Genome Analysis
Vancouver Prostate Centre - Vancouver General Hospital
2660 Oak Street
Vancouver BC V6H 3Z6
P:604-875-4111 ext. 63436
We have alpha support for Tophat v1.3.0 and it appears to work fine with
Galaxy's Tophat wrappers. We'll upgrade and start full testing near term.
For now, when running in a local instances, we do encourage users to try
it and see if it meets their needs. Perhaps try with an uncompressed
version of the same input file and see if that functions as expected or
if the same error is given when used with the Galaxy wrappers?
If you want to post testing details (Galaxy pull # from bitbucket, gzip,
OS version and anything else you feel is relevant) after this sort of
testing, the development team will use that information when full
integration testing is performed.
Perhaps others running local instances will also comment if you post
these detailed testing results to the galaxy-dev(a)bx.psu.edu mailing list
(will reach the primary external Galaxy development community).
Thanks for using Galaxy!
On 7/20/11 6:59 AM, Song Li wrote:
> Hi Jen,
> Thank you for the fast reply, however, tophat works fine with this
> command when used as stand alone software. I checked the version of gzip
> and it's the same on galaxy server and on the machine where I run tophat
> The broken pipe error is likely due to a file that is not closed by a
> previous step.
> Have anyone tested tophat 1.3 on galaxy?
> On Tue 07/19/11 6:45 PM , Jennifer Jackson <jen(a)bx.psu.edu> wrote:
> Hello Song Li,
> The file extension seems to be a mismatch.
> file.gz <- "gzip" utility
> Exploring the use of gunzip or zcat are options to restore a "file.z".
> Hopefully this helps,
> Galaxy team
> On 7/19/11 11:52 AM, Song Li wrote:
> > Hello everyone,
> > I was trying to run tophat in a local version of galaxy, but I got the
> > following error:
> > gzip: stdout: Broken pipe
> > [Tue Jul 19 14:10:36 2011] Processing bowtie hits
> > Error: could not open pipe gzip -cd<
> > It seems that when tophat is calling gzip, the pipe can not be open.
> > Can anyone suggest a fix to this problem?
> > Thanks,
> > Song Li
> > --
> > Postdoctoral Associate
> > Uwe Ohler Laboratory
> > Institute for Genome Sciences and Policy
> > 101 Science Drive
> > CIEMAS 2171
> > Phone: 919-68-2124
> > Duke University
> > ___________________________________________________________
> > The Galaxy User list should be used for the discussion of
> > Galaxy analysis and other features on the public server
> > at usegalaxy.org. Please keep all replies on the list by
> > using "reply all" in your mail client. For discussion of
> > local Galaxy instances and the Galaxy source code, please
> > use the Galaxy Development list:
> > http://lists.bx.psu.edu/listinfo/galaxy-dev <parse.php?redirect=<a
> > To manage your subscriptions to this and other Galaxy lists,
> > please use the interface at:
> > http://lists.bx.psu.edu/ <parse.php?redirect=<a
> Jennifer Jackson
> http://usegalaxy.org/ <parse.php?redirect=<a
> http://galaxyproject.org/ <parse.php?redirect=<a
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org. Please keep all replies on the list by
> using "reply all" in your mail client. For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
> http://lists.bx.psu.edu/listinfo/galaxy-dev <parse.php?redirect=<a
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
> http://lists.bx.psu.edu/ <parse.php?redirect=<a
I have come across a problem where Cufflinks is reporting all FPKM values as zeroes (0). I have a unique RNA-Seq project from a collaborator where they are studying eyesight by using tree shrews. I found that Ensembl (http://useast.ensembl.org/Tupaia_belangeri/Info/Index) has the FASTA file for the tree shrew genome (only a 2x coverage, so not very good in the first place) and had this file indexed in our local instance of Galaxy. I ran TopHat and it looks as if TopHat ran fine because I'm getting anywhere from 71-80% properly paired when I check the stats using "Flagstat." I then take the accepted hits BAM file from TopHat plus the GTF RefGene file from Ensembl for tree shrew and load that into Cufflinks. It seems as if Cufflinks works okay, but when I inspect Cufflinks three output files, all the FPKM values are 0.
I have two other RNA-Seq projects (human and mouse) and both of these projects worked fine through TopHat and Cuff(links/Compare/Diff) and with a RefGene GTF file on our local instance of Galaxy (as well as on the Galaxy instance at Penn State), so it makes me think that both TopHat and Cufflinks are working okay.
I'm wondering if it has to do something with the tree shrew reference genome. Has anyone encountered anything like this? If so, how did you solve the problem? If not, do you have any suggestions as to what I can do next? Any help/info would be greatly appreciated.
I was trying to run tophat in a local version of galaxy, but I got the
gzip: stdout: Broken pipe
[Tue Jul 19 14:10:36 2011] Processing bowtie hits
Error: could not open pipe gzip -cd <
It seems that when tophat is calling gzip, the pipe can not be open.
Can anyone suggest a fix to this problem?
Uwe Ohler Laboratory
Institute for Genome Sciences and Policy
101 Science Drive