July 2013 - galaxy-dev - lists.galaxyproject.org

queue system
by Jasper Jan Koehorst 11 Jul '13

11 Jul '13

Is it possible to create a queue system for separate jobs. I have a module which roughly consumes all CPUs on our machine and if another user decides to run the same module with other data it should wait till all other runs are finished to prevent overloading. Is there a setting for this? Jasper Koehorst Wageningen UR

2 2

virtualenv, jpype, and tool_dependencies.xml
by Michael E. Cotterell 11 Jul '13

11 Jul '13

I'm trying to setup a virtualenv with JPype via tool_dependencies.xml. I've verified that JPype is available from PyPI via pip, however, when Galaxy tries to install the dependency, it's not finding a distribution. JPype 0.5.4.2 https://pypi.python.org/pypi/JPype Here is the <package> snippet from my tool_dependies.xml file: <package name="jpype" version="0.5.4.2"> <install version="1.0"> <actions> <action type="setup_virtualenv">JPype==0.5.4.2</action> </actions> </install> <readme> In order to install JPype, you must agree to the Apache License, Version 2.0, January 2004 (http://www.apache.org/licenses/) </readme> </package> Here is the error that I get when trying to install the dependency: New python executable in /home/mepcotterell/galaxy-tool-shed/tool-deps/jpype/0.5.4.2/mepcotterell/web_service_tools/65a0d7bb35d6/venv/bin/python Installing setuptools............done. Installing pip...............done. Downloading/unpacking JPype==0.5.4.2 (from -r /home/mepcotterell/galaxy-tool-shed/tool-deps/jpype/0.5.4.2/mepcotterell/web_service_tools/65a0d7bb35d6/requirements.txt (line 1)) Could not find any downloads that satisfy the requirement JPype==0.5.4.2 (from -r /home/mepcotterell/galaxy-tool-shed/tool-deps/jpype/0.5.4.2/mepcotterell/web_service_tools/65a0d7bb35d6/requirements.txt (line 1)) No distributions at all found for JPype==0.5.4.2 (from -r /home/mepcotterell/galaxy-tool-shed/tool-deps/jpype/0.5.4.2/mepcotterell/web_service_tools/65a0d7bb35d6/requirements.txt (line 1)) Storing complete log in /home/mepcotterell/.pip/pip.log Sincerely, Michael E. Cotterell Ph.D. Student in Computer Science, University of Georgia Instructor of Record, Graduate RA & TA, University of Georgia Faculty Liaison, CS Graduate Student Association, University of Georgia mepcotterell(a)gmail.com (mailto:mepcotterell@gmail.com) mepcott(a)uga.edu (mailto:mepcott@uga.edu) mec(a)cs.uga.edu (mailto:mec@cs.uga.edu) http://michaelcotterell.com/

1 0

problem with python output
by vijayalakshmi 10 Jul '13

10 Jul '13

Hi, I am trying to integrate a tool into galaxy. The tool runs in two parts - 1) Computes correlation and provides an ouput 'txt' file (Java) 2) Takes the previously ouput txt file and outputs a 'pdf' file. (Python) I am having trouble with the second tool. If I provide a sample 'txt' file as input, it's working fine. But if I run it in galaxy, it is not working (for some reason it is not reading the 'txt' file from the previous output). Below is the code. #!/usr/bin/python import sys, os import numpy as np import matplotlib.pyplot as plt input_file = sys.argv[1] output_file = sys.argv[2] print 'Number of arguments:', len(sys.argv), 'arguments.' print 'Argument List:', str(sys.argv) txt_in = input_file + '.txt' mydata = np.loadtxt(txt_in) plt.figure(1) plt.plot(mydata[:,0],mydata[:,1],label = "Feature"); plt.plot(mydata[:,0],mydata[:,2], label = "Variance"); plt.xlabel('Distance from feature(bp)'); plt.ylabel('Score'); plt.title(sys.argv[1]); plt.legend(loc="best") pdf_out = output_file + '.pdf' plt.savefig(pdf_out) data = file(pdf_out, 'rb').read() fp = open(output_file, 'wb') fp.write(data) fp.close() os.remove(txt_in) os.remove(pdf_out) Below is the xml code. <tool id="archtex_massdata_extraction" name="Extract mass data"> <description> for the given BAM file </description> <command> java -jar Extraction.jar $input_bam_file $ref_filename $ref_filetype $output1 </command> <command interpreter="python"> plot.py $output1 $out_file1 </command> <inputs> <param name="input_bam_file" type="data" format="BAM" label="Input BAM file" help="Choose an input BAM file"/> <param name="ref_filename" type="data" format="gen,txt,gtf,bed" label="Reference/Coordination file" help="Choose a reference file"/> <param name="ref_filetype" type="select" label="Choose the reference file type"> <option value="custom">Custom</option> <option value="refgene">refGene</option> <option value="GFF">GFF</option> <option value="bed">BED</option> </param> </inputs> <outputs> <data name="output1" format="txt" /> <data name="out_file1" format="pdf" /> </outputs> </tool> I am forcing the input to be 'tx't and the output to be 'pdf' in the python script. When I run the code, it is not showing any errors but it's not showing any output either. I am able to download the first output 'txt' file but there is no download button option in the right pane of galaxy for the second part. Any help is appreciated! Thanks, VJ.

4 4

GATK DepthofCoverage tool
by Jaworek, Thomas 10 Jul '13

10 Jul '13

Hi Galaxy, When I try to process my exome data bam files using DepthOfCoverage under GATK tools (on the main instance), the program seems to keep running forever without ever completing. There is no error message output but the program never ends after going for several days! Could it be something wrong with my files or has any one else reported having an issue with it? Regards, Tom --------------------------------------------- Thomas Jaworek Department of Ophthalmology Cincinnati Children's Hospital 3333 Burnet Avenue Research Building R2409 Cincinnati, OH 45229 thomas.jaworek(a)cchmc.org 513-803-2951

1 0

tool availability
by vijayalakshmi 10 Jul '13

10 Jul '13

Hi, Are any of the following tools available in galaxy? 1) IDR http://www.uwencode.org/software/hotspot 2) Hotspot https://sites.google.com/site/anshulkundaje/projects/idr 3) Wiggler https://code.google.com/p/align2rawsignal/ 4) PhantomPeakQualTools Regards, Vijayalakshmi.

1 0

Galaxy Server Processes Dying?
by Dan Sullivan 10 Jul '13

10 Jul '13

Hi, Galaxy Developers, I have what I'm hoping is a fairly simple inquiry for the Galaxy community; basically, our production Galaxy server processes appear to be dying off over time. Our production Galaxy instance implements apache web scaling features so I have a number of server processes, for example my apache Apache configuration has: BalancerMember http://127.0.0.1:8080 BalancerMember http://127.0.0.1:8081 BalancerMember http://127.0.0.1:8082 BalancerMember http://127.0.0.1:8083 BalancerMember http://127.0.0.1:8084 BalancerMember http://127.0.0.1:8085 Nothing unconventional as I understand it. Similarly, my galaxy config has matching [server:ws3], [server:ws2] configuration blocks for each of these processes. When I restart Galaxy, everything is all fine and good. I'll see a server listening on each one of these ports (if I do something like lsof -i TCP -P, for example). What appears to be happening, is that for whatever reason, these server processes seem to die off over time (i.e eventually nothing is listening on ports 8080-8085). This process can take days, and at the time when no servers are available, Apache will begin throwing 503 service unavailable errors. I am fairly confident this process is gradual, for example I just checked now and the Galaxy was still available, however one server had died (the one on TCP port 8082). I do do have a single separate job manager and two job handlers; at this point I believe this problem to be related to the servers only (i.e. the job manager and job handlers do not appear to be crashing). Now, I believe that late last week I might have 'caught' the last server process dying, just by coincidence, although I am not 100% certain. Here is the Traceback as it occurred: galaxy.jobs.runners.pbs DEBUG 2013-07-02 08:47:12,011 (6822/39485.sc01) PBS job state changed from Q to R galaxy.jobs.runners.pbs DEBUG 2013-07-02 08:54:36,565 (6822/39485.sc01) PBS job state changed from R to C galaxy.jobs.runners.pbs DEBUG 2013-07-02 08:54:36,566 (6822/39485.sc01) PBS job has completed successfully galaxy.jobs DEBUG 2013-07-02 08:54:36,685 Tool did not define exit code or stdio handling; checking stderr for success galaxy.datatypes.metadata DEBUG 2013-07-02 08:54:36,812 loading metadata from file for: HistoryDatasetAssociation 6046 galaxy.jobs DEBUG 2013-07-02 08:54:38,153 job 6822 ended galaxy.jobs.runners.pbs DEBUG 2013-07-02 08:54:49,130 (6812/39473.sc01) PBS job state changed from R to E galaxy.jobs.runners.pbs DEBUG 2013-07-02 08:54:52,267 (6812/39473.sc01) PBS job state changed from E to C galaxy.jobs.runners.pbs ERROR 2013-07-02 08:54:52,267 (6812/39473.sc01) PBS job failed: Unknown error: -11 galaxy.jobs.runners ERROR 2013-07-02 08:54:52,267 (unknown) Unhandled exception calling fail_job Traceback (most recent call last): File "/group/galaxy/galaxy-dist/lib/galaxy/jobs/runners/__init__.py", line 58, in run_next method(arg) File "/group/galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py", line 560, in fail_job if pbs_job_state.stop_job: AttributeError: 'AsynchronousJobState' object has no attribute 'stop_job' Now, I have some questions regarding this issue; 1) It appears to me that although this is a sub-optimal solution, restarting Galaxy solves this problem (i.e. server processes will be listening after restarting Galaxy). Is it possible, or safe, or sane to just restart a single server on a singe port? Ideally I would actually like to fix the problem that is causing my server processes to crash, although I figured it wouldn't hurt to ask this question regardless. 2) Similar to the question above, is it possible to configure Galaxy in a way that server processes re-spawn in a self-service manner (i.e. is this a feature of Galaxy, for example, because server processes dying regularly is either a known issue or expected and tolerable (but undesired) behaivor)? 3) To me, the error messages above aren't very meaningful, other than the Traceback appears to be PBS-related. Would anybody be able comment on the problem above (i.e. have you seen something like this), or comment on Galaxy server processes dying in general? I have done some brief searching of the Galaxy mailing list for server crashes and did not find anything suggesting this is a common problem. 4) I am not 100% confident at this point that the Traceback above is what killed the server process. Does anybody know of a specific string I can search for (a literal) to identify when a server process actually dies? I have done some basic review of log data (our Galaxy server generates lots of logs), and Traceback does not appear to be a valid string to uniquely identify a server crash (they occur too frequently). I currently have logging configured at DEBUG. In case this is relevant, I am using the following change set for Galaxy: > hg parents changeset: 9320:47ddf167c9f1 branch: stable tag: tip user: Nate Coraor <nate(a)bx.psu.edu> date: Wed May 01 09:50:31 2013 -0400 summary: Use Galaxy's ErrorMiddleware since Paste's doesn't return start_response. Fixes downloading tarballs from the Tool Shed when use_debug = false. > I appreciate the time you took in reading my email, and any expertise you could provide in helping me troubleshoot this issue. Dan Sullivan

3 2

datatype display_data function
by Himanshu Gupta 10 Jul '13

10 Jul '13

Hello everyone, I had been using an old galaxy build, downloaded in 2010, on my local server and now I am updating it to latest build. I had made some changes as per our need. Along with the output file I have added a piece of code which was making us able to download the stderr file as well in case. I had made those changes inside the display function of lib/galaxy/webapps/galaxy/controllers/dataset.py file. But galaxy have gone through significant changes. Now the display function of lib/galaxy/webapps/galaxy/controllers/dataset.py calls display_data function. def display(self, trans, dataset_id=None, preview=False, filename=None, to_ext=None, chunk=None, **kwd): ... return data.datatype.display_data(trans, data, preview, filename, to_ext, chunk, **kwd) display_data function is present in lib/galaxy/datatypes/data.py as well as in lib/galaxy/datatypes/tabular.py The code which I had changed in previous galaxy is present in ..../data.py but the display function is calling the display_data function of .../tabular.py always. How will I make it able to call display_data function of ..../data.py Please tell me the solution of the above problem or is there any better way to download some more file like stdout, stderr apart from the regular output file. Any help would be really appreciated. P.S. I am creating stderr and stdout file inside database/user_directory Thanks Himanshu

2 1

Setting up reference genomes for GATK Genotyper
by Marco Ocana 10 Jul '13

10 Jul '13

Hi, I am running an instance of Galaxy that I setup using Cloudman. We are trying to use the GATK Unified Genotyper, but we are missing the reference genomes. I saw in your website instructions on how to setup reference genomes for some other tools, but not for the Unified Genotyper. Where can I find instructions on how to setup reference genomes for this tool? Thanks Marco

1 1

trouble with share string
by Deniz Erezyilmaz 10 Jul '13

10 Jul '13

After 1 year of using the same share string at the launch page * https://biocloudcentral.herokuapp.com/launch, *the program no longer loads my application; I. The instance loads, but my galaxyData folder is empty. I am not sure if the problem has occurred because Galaxy has moved something or whether I have deleted something critical. Any ideas? Thanks

1 0

NFS noac option
by Timothy Carr 10 Jul '13

10 Jul '13

Hi All, I've setup the Galaxy application on a standard NFS mount without the 'noac' option and then the galaxy datasets on a enabled 'noac' NFS mount. I updated the universe configuration to reflect the updated locations. The performance however to the 'noac' mount is really poor, something like 700kb/sec on a 1GB interface. Is this the only option available or have alternatives been recommended to improve the speed ? Regards Timothy Carr University of Cape Town

1 0