Getting up to speed on Galaxy and couldn't find examples or discussion related to the architecture and was hoping an expert could give some quick pointers/guidance.
Where do I find info if the installed applications make use of multiple nodes via MPI(etc) which would indicate the benefit of starting up X number of nodes for faster processing?
If a workflow has multiple initial inputs for say processing NGS exome data from tumor and blood(gets compared later in the workflow) will each step get sent to a different node(without a dependency) or will the entire workflow run on one node?
If I have NGS data for 20 patients sitting in a S3 bucket and want a specific workflow run against each patient data input(s) does this require manual selection of files by a user or can the workflow be automated?
Can I programmatically start a workflow remotely(via REST) where I have automated the process of uploading NGS data to S3 and know the input file(s) per workflow?
Is it possible to present credentials in a workflow for downloading a file via S3 where I require authentication before a file can be downloaded? Working with NGS data for patients so trying to understand how I can keep security tight. Currently planning on restricting download to IP address for the cluster but gets a little complicated for what amazon is doing behind the scenes in its internal network.
I would also like to push results/output back to S3 and didn't see anything obvious to do this. Gets a little complicated in that you would need to probably put results back in the same S3 bucket/new folder where the original source files came from. I saw mention of using scp to move files but that doesn't help to put results back in S3.
So far I really like what I have seen and hope Galaxy becomes the future toolbox for our work.
Does a roadmap exist for what is planned in the future? For example any additional tools NGS tools like Abyss going to make into the build? Interested in NGS software that handles the dynamics of cancer for gene fusion events, CNVs(etc) when dealing with NGS data.
My goal is to introduce, in the xml file of one tool like MACS for example,
a supplementary command to redirect the output in another directory (+
creating link between this and the directory of galaxy outputs).
But I want to rename my output with the same name that the downloading
tools create in this way : GALAXY-NumOfDatasetInHistory[NameOfInput].bed
And I don't find where this downloading tool is and so I don't find how
create this name.
(cc'ing the dev list and updating the subject line in case others are interested)
> I have been looking for Java related API's to run workflows externally and
> haven't found anything searching message forums etc. Would like to
> automate data coming off up hiseq uploaded to Amazon S3 and then
> programmatically from external process import the fastq files and kick off
> a workflow to process. If you know of any docs or Java API for doing this
> kind of external control can you point me to it.
John Chilton has a Java library to access the API through Java:
which should cover lots of this. If you're interested in other JVM
languages, I built a small Clojure wrapper around this to simplify some
We'd definitely love to have more people involved, so if any
functionality you need is missing please feel free to submit
I'm trying to test out the functional testing mechanism by running it
on an existing Galaxy tool.
First I ran
which produced a list of tools I can test. I chose 'vcf_annotate' and
tested it as follows:
./run_functional_tests.sh -id vcf_annotate
This produced a lot of output which included an exception trace. The
output was not conclusive as to whether the test ran or was
The output is too long for this mailing list but you can find it here:
I am reluctant to try and excerpt the relevant bits because it's hard
for me to know what is relevant and what is not.
I am running the latest Galaxy (just did hg pull/hg update and migrated).
This is on a Mac OS X 10.7.4 machine with python 2.7.
When I run the same command on a linux machine, it works (though it
took me a while to find the test output; it was buried in a lot of
output that also contained (apparently irrelevant) stack traces).
So perhaps there is something wrong with my configuration.
Hope someone can help me out.
Also had a couple of newbie questions about the functional test framework.
1) Why does it use tool_conf.xml.sample instead of tool_conf.xml? Can
I change it to use tool_conf.xml? This way I do not need to add tools
to two places in order to test them. (Plus the name of
tool_conf.xml.sample indicates that it is just a demo file).
2) run_functional_tests.sh -list lists tools (such as 'upload1') that
do not have functional tests, so cannot (if my understanding is
correct) be tested with this script. Perhaps it would make more sense
not to list these tools?
I am trying to configure my galaxy instance and I have two problem. The first one is that I cannot delete users, I created some users for testing, I enabled the option on the universe_wsg.ini, and the button appears, but the users set only marked as deleted but they didn't disappear from the users list. Is that normal?
The second problem is that I am trying to set an email confirmation for ensure that the users email exists, there is any way to do that? I have introduced the email information on the ini file, but I cannot see any other option for enabling that.
Thanks to everyone for your help
Started up a cluster on Amazon using the Launch a Galaxy Cloud Instance and got the following message. Since I don't have any control over where the instances are run not sure how I can control this. The last 4 or 5 times I have started up an existing instance has worked with no problem.
Messages (CRITICAL messages cannot be dismissed.)
1. [CRITICAL] Volume 'vol-f882ca85' is located in the wrong availability zone for this instance. You MUST terminate this instance and start a new one in zone 'us-east-1a'. (2012-10-31 14:25:20)
I ran into SSL certification errors when using Java to connect to Galaxy
main via the API. My knowledge of this stuff is minimal, but I did some
searching and discovered that the certificate chain on Galaxy main is a problem:
Looking at the chain with openssl shows a swap of the AddTrust and Internet2
$ openssl s_client -connect main.g2.bx.psu.edu:443
depth=2 C = SE, O = AddTrust AB, OU = AddTrust External TTP Network, CN = AddTrust External CA Root
verify error:num=19:self signed certificate in certificate chain
0 s:/C=US/postalCode=16802/ST=PA/L=University Park/O=The Pennsylvania State University/OU=Center for Comparative Genomics and Bioinformatics/CN=bigsky.bx.psu.edu
i:/C=US/O=Internet2/OU=InCommon/CN=InCommon Server CA
1 s:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root
i:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root
2 s:/C=US/O=Internet2/OU=InCommon/CN=InCommon Server CA
i:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root
As a result, more picky verification mechanisms fail because of the self
signed certificate in the middle of the chain instead of as the root.
It appears you can fix this by adjusting the order of certificates
Hope this helps,
I contacted with this question vendor tech support (Dell), but they could not answer (or did not want to) and directed me to Galaxy developers. I am using RHEL58 and SciLinux55 and want to install a local instance of Galaxy. Both my systems are based on Python 2.4. Question - can I install Python 2.6/2.7 locally without messing up the system? I was advised earlier not to make system install, but being unhealthy curious I did and ended up with reinstalling SciLinux 55 from scratch. How to make sure 2.6/2.7 will not mess up the system's Python?
Is it possible to load a unique gff file with the annotations of several
chromosomes for my custom build in one step (one gff file)?
With the current version of galaxy, it seems that I can load a gff file
referring only to one chromosome. That's pretty tedious to load 43 gff
files separatly for my custom build...
If I try, I get this error:
Traceback (most recent call last):
30, in main
for feature in read_unordered_gtf( open( in_fname, 'r' ) ):
File "~/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py", line 389,
feature = GFFFeature( None, intervals=intervals )
File "~/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py", line 65,
( interval.chrom, self.chrom ) )
ValueError: interval chrom does not match self chrom: SAGS2 != SAGS1
Plateforme Genome Transcriptome
Tel: 05 57 12 27 75
INRA-UMR BIOGECO 1202
69 route d'Arcachon