December 2015 - galaxy-dev - lists.galaxyproject.org

How does one create a viewer for a new datatype to display a webpage if eyeball is clicked?
by rbrown1422＠comcast.net 04 Dec '15

04 Dec '15

Good afternoon, I am trying create a display module for a special binary datatype. When someone requests to view the output file in the History using the eyeball it will kick off my code to create an HTML display in the Galaxy center section. I am looking at binary.py but I am not sure if this is it. Is there sample or existing code to look at? I hope this makes sense to someone on the team. Thanks, bob

2 1

Re: [galaxy-dev] wolfpsort
by Olivier CLAUDE 04 Dec '15

04 Dec '15

Hello again, I cannot launch it manually neither. It gives me the same error… : >runWolfPsortSummary animal test.fasta >Can't locate fastafmt/GetOptWarnHandler.pm in @INC (you may need to install the fastafmt::GetOptWarnHandler module) (@INC contains: /etc/perl /usr/local/lib/perl/5.18.2 /usr/local/share/perl/5.18.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.18 /usr/share/perl/5.18 /usr/local/lib/site_perl . /bin) at /bin/runWolfPsortSummary line 37. BEGIN failed--compilation aborted at /bin/runWolfPsortSummary line 37. I think i missed something somewhere. Any lead? thanks De : Peter Cock [mailto:p.j.a.cock@googlemail.com] Envoyé : mercredi 2 décembre 2015 18:59 À : Olivier CLAUDE <o.claude(a)outlook.fr> Cc : galaxy-dev(a)lists.galaxyproject.org; Björn Grüning <bjoern.gruening(a)gmail.com> Objet : Re: [galaxy-dev] 3 questions On Wed, Dec 2, 2015 at 5:07 PM, Olivier CLAUDE < <mailto:o.claude@outlook.fr> o.claude(a)outlook.fr> wrote: Hello again, 1/ I managed to find WolfPsort on github. You mean <https://github.com/fmaguire/WoLFPSort> https://github.com/fmaguire/WoLFPSort ? I can see why Finlay Maguire did that although I'm not 100% sure that is within the licence. In other news, it looks like someone else has bought the domain <http://wolfpsort.org/> http://wolfpsort.org/ and is holding it to random (for sale, offers over $1190). Oh dear. :( I followed the readme file and it work with the command line but when I tried to run it on galaxy with the one included in the package “tmhmm and signal” from peterjc, it gave me an error: “can’t locate fastafmt/GetOptWarnHandler.pm” I put in my .bashrc the path of the file. I tried to put it directly in the /bin directory but it didn’t change anything. Any idea anyone? Have you followed the INSTALL file instructions? Have you been able to run WolfPsort at the command line? Note it only has precompiled binaries under bin/binByPlatform for i386 and sparc, however we could use the 32bit binaries on our 64bit Linux machine. According to my old comment inside the Python wrapper script for Galaxy, I had trouble running the tool from outside its home directory and so used a simple (second) wrapper script to change directory before running the real binary. See: <https://github.com/peterjc/pico_galaxy/blob/master/tools/protein_analysis/w…> https://github.com/peterjc/pico_galaxy/blob/master/tools/protein_analysis/w… 2/in the python script I can see: Num_thread = thread_count(sys.argv[2], default=4) Does it means that it will use at the maximum 4 threads? Can I assume it will use 4 core in parallel? This means my Python wrapper script will default to 4 threads if not specified via the command line. When called via Galaxy, the XML wrapper will use the $GALAXY_SLOTS environment variable (if set). See the job runner settings taken from job_conf.xml for details. This means unless you've setup something special in Galaxy for the number of slots/threads to use, the tool will default to four threads. This means it will break up the input file into chunks and run four copies of the single threaded tool runWolfPsortSummary at once. 3/I use blast + from the devteam is there any possibility to use more than 1 core? If yes where? Thanks ! Again, the BLAST+ wrappers will use the $GALAXY_SLOTS environment variable (if set), although here they default to using calling BLAST+ with 8 threads. Peter

2 7

Re: [galaxy-dev] Publish tool with Python dependencies to the toolshed
by Dooley, Damion 03 Dec '15

03 Dec '15

Related to this thread, my report calc tool needs a few python packages, but I think they are generic enough to be in python 2.6, 2.7 or 3.x , so can I specify the packages without specifying the python version? (I'm looking at the https://github.com/galaxyproject/tools-iuc/blob/master/packages/package_pyt hon_2_7_bcbiogff_0_6_2/tool_dependencies.xml file). Or do I need to be doing something like: <action type="shell_command">pip install ...</action> ? I'm able to upload this to our local toolshed, and it seems to install ok from there, but I see in test server galaxy log "galaxy.tools.deps WARNING 2015-12-03 22:23:02,513 Failed to resolve dependency on 'report_calc', ignoring", though tool still runs (probably because the python packages were already installed manually previously.) Much obliged, Damion Tool definition file has: <requirements> <requirement type="package" version="0.1.0">report_calc</requirement> </requirements> tool_dependencies.xml : <?xml version="1.0"?> <tool_dependency> <package name="report_calc" version="0.1.0"> <install version="1.0"> <actions> <action type="setup_python_environment"> <package>https://pypi.python.org/packages/source/d/dateutils/dateutils -0.6.6.tar.gz#md5=2ba7fcac03635f1f1cad0d94d785001b</package> <package>https://pypi.python.org/packages/source/p/pyparsing/pyparsing -2.0.6.tar.gz#md5=a2d85979e33a6600148c6383d3d8de67</package> </action> </actions> </install> <readme><![CDATA[ Report Calc ----------- A python tool to text-mine log and report files for variables and tabular data, and to create custom reports for this data, as well as triggering workflow halt based on quality control thresholds. This tool requires the following additional python packages. pip install dateutils pyparsing ]]> </readme> </package> </tool_dependency>

1 0

3 questions
by Olivier CLAUDE 03 Dec '15

03 Dec '15

Hello again, 1/ I managed to find WolfPsort on github. I followed the readme file and it work with the command line but when I tried to run it on galaxy with the one included in the package tmhmm and signal from peterjc, it gave me an error: cant locate fastafmt/GetOptWarnHandler.pm I put in my .bashrc the path of the file. I tried to put it directly in the /bin directory but it didnt change anything. Any idea anyone? 2/in the python script I can see: Num_thread = thread_count(sys.argv[2], default=4) Does it means that it will use at the maximum 4 threads? Can I assume it will use 4 core in parallel? 3/I use blast + from the devteam is there any possibility to use more than 1 core? If yes where? Thanks ! M Olivier CLAUDE- PhD Student INSERM/UPMC UMRS ICAN 1166 Equipe 2 Faculté de médecine Pitié-Salpêtrière 91, bld de l'Hôpital - 3ème étage - Porte 305 75013 Paris

2 2

Publish tool with Python dependencies to the toolshed
by Ulf Schaefer 03 Dec '15

03 Dec '15

Dear all I have a Python script that I would like to publish to the tool shed. It has a couple dependencies, all Python modules (argparse, PyVCF, PyYAML). It also needs a Python version 2.7<=X<3, which is not available per default everywhere. Firstly, how do I specify these dependencies in the tool <requirements> tag? And secondly, once I have a correct set of tool definition files with tests and test data, how do I best get it onto the tool shed? I had a look through the wiki and the planemo documentation. Unfortunately how to handle these dependencies did not become immediately obvious (to me). Thanks for any pointers and sorry if this has been answered some place I did not look. Cheers Ulf -- Ulf Schaefer, PhD Bioinformatics Scientist Bioinformatics Unit - Infectious Disease Informatics National Infection Service Public Health England 61 Colindale Ave, London NW9 5EQ ulf.schaefer(a)phe.gov.uk http://www.gov.uk/phe Protecting and improving the nation’s health ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE **************************************************************************

3 5

Galaxy job scheduler slows down, when too many jobs are in the queue
by Hans-Rudolf Hotz 02 Dec '15

02 Dec '15

Hi Over the last few days, I have encountered a very bizarre behavior of the internal Galaxy job scheduler. It all started with using the API to generate Data Libraries: Instead of generating individual Data Libraries when requested, I decided to make all HiSEQ and MiSEQ data which has been produced in our institute for the last two years available. I used the following call from BioBlend upload_from_galaxy_filesystem(library_id, filesystem_paths, folder[0]["id"], type, dbkey='?', link_data_only='link_to_files', roles='') to link ~19000 (fastq and metadata) files into several sub-folders of either the 'HiSEQ' or the 'MiSEQ' Data Library. That all worked very well - btw a big Thank You to all the BioBlend developers! Using the "Data libraries Beta" page I could nicely follow how my script is working down all the files. Unfortunately, I realized too late, that although, the files were showing up correctly (i.e with the right path to the original file) in the "Data libraries Beta" page, the actual 'upload' job had not been finished. So, when my script was done, I ended up with about ~16000 unfinished jobs waiting in the queue. We use the internal scheduler, and the settings in the job_conf.xml, were set to <limit type="registered_user_concurrent_jobs">2</limit> . At the beginning, the 'upload' jobs were running one after the other. However, the more jobs were in the queue, the longer it took between the two jobs were started. At the hight, two jobs were started only every ~60 minutes. During that hour, nothing happened and no job was set to "running". Even if someone else was using the Galaxy server, there was a wait of an hour for that job to be executed. Luckily I did all this on our development server, so no actual user was affected. I changed the settings in the job_conf.xml file to allow 100 jobs per user with a total of 105 concurrent jobs. I restarted the server, and now, every hour 100 'upload' jobs were executed. But again, there were about 60 minutes in between, when nothing happened. I was playing with the 'cache_user_job_count' setting ("True"/"False") but that didn't change anything. With 100 jobs executed every hour, the queue became eventually smaller and smaller. At about 5000 jobs to go, the gap reduced to ~30 minutes and at about 2000 jobs to go, the waiting time was about 10 minutes and eventually it went down to zero again. Has anyone else seen such a behavior before? Thank very much for any help or suggestions Regards, Hans-Rudolf PS: I now modifying the script, with a call to the database to check whether all jobs have been done, before making the call to upload more files to the Data Libraries. -- Hans-Rudolf Hotz, PhD Bioinformatics Support Friedrich Miescher Institute for Biomedical Research Maulbeerstrasse 66 4058 Basel/Switzerland

1 0

Picard installation dependency issues
by Scott Szakonyi 01 Dec '15

01 Dec '15

Hi all, I'm installing the latest revision of Picard tools (efc56ee1ade4). Picard installs successfully, and the tools are appearing in the appropriate menu, but several packages that install with Picard are reporting dependency issues, and I'm concerned that not all the tools will work appropriately. The tools reporting issues are: - package_cairo_1_12_14 (last lines of long warning/error sequence) - /vectorbase/web/Galaxy/galaxy-dist/dependency_dir/libpng/1.6.7/devteam/package_libpng_1_6_7/f48b920cae1f/lib: file not recognized: Is a directory collect2: ld returned 1 exit status make[3]: *** [libcairo.la] Error 1 make[2]: *** [all] Error 2 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2 - package_pixman_0_32_4 - /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o: In function `_start': (.text+0x20): undefined reference to `main' collect2: ld returned 1 exit status make[2]: *** [clip-test] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2 - package_readline_6_2 - Error downloading from URL ftp://ftp.gnu.org/gnu/readline/readline-6.2.tar.gz: <urlopen error ftp error: [Errno 110] Connection timed out> Is anyone else having issues with these packages? Any suggestions for resolving these problems? Thanks! -- Scott B. Szakonyi Research Programmer *Center for Research Computing* 107 Information Technology Center Notre Dame, IN 46556 http://crc.nd.edu

3 6

December 2015 Galactic News
by Dave Clements 01 Dec '15

01 Dec '15

1 0

Aberrant data allocation and SNP mapping problem
by Felix Mayr 01 Dec '15

01 Dec '15

Dear Galaxy dev, I have two problems, which might be related to each other. 1) For a long time already, I continuously have ‘left over’ data allocation after deleting and purging histories. Someone from your team has repeatedly resolved that problem, but it root of the problem still persists. I know have 91% allocation, when only having two histories of 67.48 and 59.45 GB. The fix for automatic recalculation doesn’t seem to work…? 2) I recently performed SNP mapping using a CloudMap based pipeline. The problem: the SNP mapping plots shows the exact same mapping/plots of the SNPs. They can not be the same. What is not the problem: the input files, as they are different in size, first and last lines of the files, and they produce very different SNP mapping mutation tables. Therefore, I think there might be a problem that the workflow somehow uses the same intermediate file for generating the SNP mapping plots. I already re-uploaded and rerun the workflows in new histories multiple times, but the same result each time. I would upload and run another data set for which I already have SNP mapping plots to see whether I again get the same plots or not, but that is now not possible as I am close to the data quota. I hope you can help solve these problems. All best, Felix ˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜ Felix Mayr, MSc. Max Planck Institute for Biology of Ageing Research Group of Dr. Martin Denzel Metabolic and Genetic Regulation of Ageing Joseph-Stelzmann-Str. 9b D-50931 Cologne Germany +49.221.3797.0465 www.age.mpg.de/science/research-labs/denzel/

3 3