Is it possible to create a queue system for separate jobs.
I have a module which roughly consumes all CPUs on our machine and if another user decides to run the same module with other data it should wait till all other runs are finished to prevent overloading.
Is there a setting for this?
I'm trying to setup a virtualenv with JPype via tool_dependencies.xml. I've verified that JPype is available from PyPI via pip, however, when Galaxy tries to install the dependency, it's not finding a distribution.
JPype 0.5.4.2 https://pypi.python.org/pypi/JPype
Here is the <package> snippet from my tool_dependies.xml file:
<package name="jpype" version="0.5.4.2">
In order to install JPype, you must agree to the Apache License, Version 2.0, January 2004 (http://www.apache.org/licenses/)
Here is the error that I get when trying to install the dependency:
New python executable in /home/mepcotterell/galaxy-tool-shed/tool-deps/jpype/0.5.4.2/mepcotterell/web_service_tools/65a0d7bb35d6/venv/bin/python
Downloading/unpacking JPype==0.5.4.2 (from -r /home/mepcotterell/galaxy-tool-shed/tool-deps/jpype/0.5.4.2/mepcotterell/web_service_tools/65a0d7bb35d6/requirements.txt (line 1))
Could not find any downloads that satisfy the requirement JPype==0.5.4.2 (from -r /home/mepcotterell/galaxy-tool-shed/tool-deps/jpype/0.5.4.2/mepcotterell/web_service_tools/65a0d7bb35d6/requirements.txt (line 1))
No distributions at all found for JPype==0.5.4.2 (from -r /home/mepcotterell/galaxy-tool-shed/tool-deps/jpype/0.5.4.2/mepcotterell/web_service_tools/65a0d7bb35d6/requirements.txt (line 1))
Storing complete log in /home/mepcotterell/.pip/pip.log
Michael E. Cotterell
Ph.D. Student in Computer Science, University of Georgia
Instructor of Record, Graduate RA & TA, University of Georgia
Faculty Liaison, CS Graduate Student Association, University of Georgia
I am trying to integrate a tool into galaxy. The tool runs in two parts -
1) Computes correlation and provides an ouput 'txt' file (Java) 2)
Takes the previously ouput txt file and outputs a 'pdf' file. (Python)
I am having trouble with the second tool. If I provide a sample 'txt' file
as input, it's working fine. But if I run it in galaxy, it is not working
(for some reason it is not reading the 'txt' file from the previous
output). Below is the code.
import sys, os
import numpy as np
import matplotlib.pyplot as plt
input_file = sys.argv
output_file = sys.argv
print 'Number of arguments:', len(sys.argv), 'arguments.'
print 'Argument List:', str(sys.argv)
txt_in = input_file + '.txt'
mydata = np.loadtxt(txt_in)
plt.plot(mydata[:,0],mydata[:,1],label = "Feature");
plt.plot(mydata[:,0],mydata[:,2], label = "Variance");
plt.xlabel('Distance from feature(bp)');
pdf_out = output_file + '.pdf'
data = file(pdf_out, 'rb').read()
fp = open(output_file, 'wb')
Below is the xml code.
<tool id="archtex_massdata_extraction" name="Extract mass data">
<description> for the given BAM file </description>
<command> java -jar Extraction.jar
$input_bam_file $ref_filename $ref_filetype $output1
<command interpreter="python"> plot.py $output1 $out_file1
<param name="input_bam_file" type="data" format="BAM" label="Input BAM
file" help="Choose an input BAM file"/>
<param name="ref_filename" type="data" format="gen,txt,gtf,bed"
file" help="Choose a reference file"/>
<param name="ref_filetype" type="select" label="Choose the reference file
<data name="output1" format="txt" />
<data name="out_file1" format="pdf" />
I am forcing the input to be 'tx't and the output to be 'pdf' in the python
script. When I run the code, it is not showing any errors but it's not
showing any output either. I am able to download the first output 'txt'
file but there is no download button option in the right pane of galaxy for
the second part.
Any help is appreciated!
When I try to process my exome data bam files using DepthOfCoverage under GATK tools (on the main instance), the program seems to keep running forever without ever completing. There is no error message output but the program never ends after going for several days! Could it be something wrong with my files or has any one else reported having an issue with it?
Department of Ophthalmology
Cincinnati Children's Hospital
3333 Burnet Avenue
Research Building R2409
Cincinnati, OH 45229
Hi, Galaxy Developers,
I have what I'm hoping is a fairly simple inquiry for the Galaxy community; basically, our production Galaxy server processes appear to be dying off over time. Our production Galaxy instance implements apache web scaling features so I have a number of server processes, for example my apache Apache configuration has:
Nothing unconventional as I understand it. Similarly, my galaxy config has matching [server:ws3], [server:ws2] configuration blocks for each of these processes. When I restart Galaxy, everything is all fine and good. I'll see a server listening on each one of these ports (if I do something like lsof -i TCP -P, for example). What appears to be happening, is that for whatever reason, these server processes seem to die off over time (i.e eventually nothing is listening on ports 8080-8085). This process can take days, and at the time when no servers are available, Apache will begin throwing 503 service unavailable errors. I am fairly confident this process is gradual, for example I just checked now and the Galaxy was still available, however one server had died (the one on TCP port 8082). I do do have a single separate job manager and two job handlers; at this point I believe this problem to be related to the servers only (i.e. the job manager and job handlers do not appear to be crashing).
Now, I believe that late last week I might have 'caught' the last server process dying, just by coincidence, although I am not 100% certain. Here is the Traceback as it occurred:
galaxy.jobs.runners.pbs DEBUG 2013-07-02 08:47:12,011 (6822/39485.sc01) PBS job state changed from Q to R
galaxy.jobs.runners.pbs DEBUG 2013-07-02 08:54:36,565 (6822/39485.sc01) PBS job state changed from R to C
galaxy.jobs.runners.pbs DEBUG 2013-07-02 08:54:36,566 (6822/39485.sc01) PBS job has completed successfully
galaxy.jobs DEBUG 2013-07-02 08:54:36,685 Tool did not define exit code or stdio handling; checking stderr for success
galaxy.datatypes.metadata DEBUG 2013-07-02 08:54:36,812 loading metadata from file for: HistoryDatasetAssociation 6046
galaxy.jobs DEBUG 2013-07-02 08:54:38,153 job 6822 ended
galaxy.jobs.runners.pbs DEBUG 2013-07-02 08:54:49,130 (6812/39473.sc01) PBS job state changed from R to E
galaxy.jobs.runners.pbs DEBUG 2013-07-02 08:54:52,267 (6812/39473.sc01) PBS job state changed from E to C
galaxy.jobs.runners.pbs ERROR 2013-07-02 08:54:52,267 (6812/39473.sc01) PBS job failed: Unknown error: -11
galaxy.jobs.runners ERROR 2013-07-02 08:54:52,267 (unknown) Unhandled exception calling fail_job
Traceback (most recent call last):
File "/group/galaxy/galaxy-dist/lib/galaxy/jobs/runners/__init__.py", line 58, in run_next
File "/group/galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py", line 560, in fail_job
AttributeError: 'AsynchronousJobState' object has no attribute 'stop_job'
Now, I have some questions regarding this issue;
1) It appears to me that although this is a sub-optimal solution, restarting Galaxy solves this problem (i.e. server processes will be listening after restarting Galaxy). Is it possible, or safe, or sane to just restart a single server on a singe port? Ideally I would actually like to fix the problem that is causing my server processes to crash, although I figured it wouldn't hurt to ask this question regardless.
2) Similar to the question above, is it possible to configure Galaxy in a way that server processes re-spawn in a self-service manner (i.e. is this a feature of Galaxy, for example, because server processes dying regularly is either a known issue or expected and tolerable (but undesired) behaivor)?
3) To me, the error messages above aren't very meaningful, other than the Traceback appears to be PBS-related. Would anybody be able comment on the problem above (i.e. have you seen something like this), or comment on Galaxy server processes dying in general? I have done some brief searching of the Galaxy mailing list for server crashes and did not find anything suggesting this is a common problem.
4) I am not 100% confident at this point that the Traceback above is what killed the server process. Does anybody know of a specific string I can search for (a literal) to identify when a server process actually dies? I have done some basic review of log data (our Galaxy server generates lots of logs), and Traceback does not appear to be a valid string to uniquely identify a server crash (they occur too frequently). I currently have logging configured at DEBUG.
In case this is relevant, I am using the following change set for Galaxy:
> hg parents
user: Nate Coraor <nate(a)bx.psu.edu>
date: Wed May 01 09:50:31 2013 -0400
summary: Use Galaxy's ErrorMiddleware since Paste's doesn't return start_response. Fixes downloading tarballs from the Tool Shed when use_debug = false.
I appreciate the time you took in reading my email, and any expertise you could provide in helping me troubleshoot this issue.
I had been using an old galaxy build, downloaded in 2010, on my local
server and now I am updating it to latest build. I had made some changes as
per our need. Along with the output file I have added a piece of code which
was making us able to download the stderr file as well in case. I had made
those changes inside the display function
of lib/galaxy/webapps/galaxy/controllers/dataset.py file. But galaxy have
gone through significant changes. Now the display function
of lib/galaxy/webapps/galaxy/controllers/dataset.py calls display_data
def display(self, trans, dataset_id=None, preview=False, filename=None,
to_ext=None, chunk=None, **kwd):
return data.datatype.display_data(trans, data, preview, filename, to_ext,
display_data function is present in lib/galaxy/datatypes/data.py as well as
The code which I had changed in previous galaxy is present in ..../data.py
but the display function is calling the display_data function of
.../tabular.py always. How will I make it able to call display_data
function of ..../data.py
Please tell me the solution of the above problem or is there any better way
to download some more file like stdout, stderr apart from the regular
output file. Any help would be really appreciated.
P.S. I am creating stderr and stdout file inside database/user_directory
I am running an instance of Galaxy that I setup using Cloudman.
We are trying to use the GATK Unified Genotyper, but we are missing the
I saw in your website instructions on how to setup reference genomes for
some other tools, but not for the Unified Genotyper.
Where can I find instructions on how to setup reference genomes for this
After 1 year of using the same share string at the launch page *
https://biocloudcentral.herokuapp.com/launch, *the program no longer loads
my application; I. The instance loads, but my galaxyData folder is empty. I
am not sure if the problem has occurred because Galaxy has moved something
or whether I have deleted something critical. Any ideas? Thanks
I've setup the Galaxy application on a standard NFS mount without the
'noac' option and then the galaxy datasets on a enabled 'noac' NFS
mount. I updated the universe configuration to reflect the updated
locations. The performance however to the 'noac' mount is really poor,
something like 700kb/sec on a 1GB interface.
Is this the only option available or have alternatives been recommended
to improve the speed ?
University of Cape Town