August 2012 - galaxy-dev - lists.galaxyproject.org

File upload via ftp: FileZilla
by suzan katie 15 Aug '12

15 Aug '12

Hello everyone, I am trying to upload files (size:6GB) via ftp and i am using FileZilla for ftp upload of files. Host/usernames and password details as follows: Host: main.g2.bx.psu.edu Username: My galaxy Username Password: Galaxy password I found this tutorial for FTP upload. http://screencast.g2.bx.psu.edu/quickie_17_ftp_upload/flow.html I am getting an error message: Could not connect to server Status: Connection attempt failed with "ECONNREFUSED - Connection refused by server". Is there any other method followed for ftp file upload? Any help or advice is appreciated Thanks Suzan

3 2

Upload File: how to see all registered formats ?
by Marc Logghe 15 Aug '12

15 Aug '12

Hi again, Emboss_datatypes is installad via tool shed. The 'View datatypes registry' in the Adminstration page, shows all registered data types, including the emboss ones, like e.g. genbank. However, the 'Upload File' tool does not show genbank in the 'File Format' dropdown. How can one configure the upload1 tool so that all registered formats are listed ? I don't understand the from_parameter hocus-pocus in this line (upload.xml): <options from_parameter="tool.app.datatypes_registry.upload_file_formats" transform_lines="[ "%s%s%s" % ( line, self.separator, line ) for line in obj ]"> The workaround which is used up to now is upload genbank sequences as txt format and change the format later via the history. But that is a 2-step operation, guess there should be a way taking only 1 step ? Regards, Marc ________________________________________ THIS E-MAIL MESSAGE IS INTENDED ONLY FOR THE USE OF THE INDIVIDUAL OR ENTITY TO WHICH IT IS ADDRESSED AND MAY CONTAIN INFORMATION THAT IS PRIVILEGED, CONFIDENTIAL AND EXEMPT FROM DISCLOSURE. If the reader of this E-mail message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately at ablynx(a)ablynx.com. Thank you for your co-operation. "NANOBODY" and "NANOCLONE" are registered trademarks of Ablynx N.V. ________________________________________

3 6

problems in integrating tools to GALAXY
by Jenny 15 Aug '12

15 Aug '12

Hi, Dear all, I am trying to integrate a perl script to GALAXY, and here are the problems. I have a tool configuration xml file called abc.xml located at tools/myTools directory, which calls a perl scripts called toolExample.pl. 1 <tool id="abc" name="abc"> 2 <command interpreter="perl">toolExample.pl $input $output</command> 3 <inputs> 4....... 5...... Inside the toolExample.pl, I am using some linux system command such as "wc" and also "BEDTools/windowBed". The BEDTools executables are configured correctly in my local linux machine, and when I type "which windowBed" it can give me back the path correctly. When I run toolExample.pl outside galaxy in the terminal, everything looks correct; but when I run galaxy, it gave me the nonsense results ( all 0) and it seems that GALAXY has problems in recognizing either the windowBed excutable file or the linux system command "wc" or maybe both. To let GALAXY recognize "windowBed", I tried to put the windowBed executable copied to myTools directory and add the following <requirements>tags to the abc.xml file, but it doesn't seem to be working. ..... <requirements> <requirement type="binary">windowBed</requirement> </requirements> .... Could anyone please help me figure out what was wrong? How can I fix the problem? Thank you very much in advance! Best, Jenny

2 1

how to edit attributes of tool
by Robert Chase 14 Aug '12

14 Aug '12

Hello, Is there a way to annotate the notes of a custom tool to include information about the parameters passed to the tool? Let us say a user specifies a number n as a parameter to the tool. Is it possible to put this parameter in the notes that appear when the pencil icon is pressed? Reading the notes under the <data> tag set, wiki.g2.bx.psu.edu/Admin/Tools/Tool Config Syntax I didn't see any hint about how to pass the script parameters to the metadata boxes under the pencil icon. -Rob

1 0

Local Galaxy concept system: hardware spec questions
by Sebastian Schaaf 14 Aug '12

14 Aug '12

Hi all, I have a couple of question around the topic "hardware requirements" for a server which is intended to be bought and used as concept machine for NGS-related jobs. It should be used for development of tools and workflows (using Galaxy, sure) as well as platform for some "alpha" users, who should learn to work on NGS data, which they just began to generate. This concept phase is planned to last 1-2 years. During this time main memory and especially storage could be extended, the latter on a per-project basis. We will start with a small team of 3 people for supporting and developing Galaxy and system due to the user's requirements, and the first group of users will bring in data, scientific questions and hands-on work on their own data. Main task (regarding system load) will be sequence alignment (BLAST, mapping tools like BWA/Bowtie), and after that maybe some experimental sequence clustering/de novo assembly for exome data. Additionally variant detection in whatever form are targeted. Only active projects will be stored locally, data no more in use will be stored elsewhere in the network. So far for the setting, regarding the specs the following is intended: - dual-CPU mainboard - 256 GB RAM - 20-30 TB HDD @ RAID6 (data) - SSDs @ RAID5 (system, tmp) Due to funding limitations it may be the case that RAM has to be decreased to 128 GB, not solved is currently the question, if it will be enough for those SSD bundle in RAID5, maybe we have to go for only two of them in RAID1. What we try to find out is, where in those described tasks the machine would run into bottlenecks. What's pretty clear is that I/O is everything, already by a theoretical point of view. But we also observed that on a comparable machine (2x 3,33 Ghz Intel 6-core, 100GB RAM, 450 MB/s R/W to data RAID6). The question of questions is right at the beginning of configuring a system, if one should go for an AMD or an Intel architecture system. The first offers more cores (8-12) at a lower frequency (~2,4 Ghz), the latter less cores (6) with higher frequency (~3,3 Ghz). Due to the data sheets, the Intel CPUs are on a per-core basis ~30% faster with integer operations and ~50% faster with floating point. The risk we see with the AMDs is on the one hand that the number of cores per socket could saturate the memory controller, and on the other hand those jobs, which can not or only poorly be parallelized need more time. To bring all this to some distinct questions (don't feel forced to answer all of them): 1. Using the described bioinformatics software: where are the potential system bottlenecks? (connections between CPUs, RAM, HDDs) 2. What is the expected relation of integer-based and floating point based calculations, which will be loading the CPU cores? 3. Regarding the architectural differences (strengths, weaknesses): Would an AMD- or an Intel-System be more suitable? 4. How much I/O (read and write) can be expected at the memory controllers? Which tasks are most I/O intensive (regarding RAM and/or HDDs)? 5. Roughly separated in mapping and clustering jobs: which amounts of main memory can be expected to be required by a single job (given e.g. Illumina exome data, 50x coverage)? As far as I know mapping should be around 4 GB, clustering much more (may reach high double digits). 6. HDD access (R/W) is mainly in bigger blocks instead of masses of short operations - correct? All those questions are a bit rough and improved (yes, it IS a bit of a chaos currently - sorry for that), but any clue to a single question would help. "Unfortunately" we got the money to place the order for our own hardware unexpectedly quick, and we are now forced to act. We want to make as few cardinal errors as possible... Thanks a lot in advance, Sebastian -- Sebastian Schaaf, M.Sc. Bioinformatics Chair of Biometry and Bioinformatics Department of Medical Information Sciences, Biometry and Epidemiology University of Munich

3 4

Re: [galaxy-dev] Local Galaxy concept system: hardware spec
by Paul-Michael Agapow 14 Aug '12

14 Aug '12

Some quick answers in the hopes that more qualified people will chip in: I have a couple of question around the topic "hardware requirements" for a server which is intended to be bought and used as concept machine for NGS-related jobs. First a comment - it sounds a bit like you are where we were 12 months ago in developing our Galaxy system and looking at similar needs. I think you'll almost always need more of everything, because people will always be analysing bigger datasets, building bigger assemblies, etc. 1. Using the described bioinformatics software: where are the potential system bottlenecks? (connections between CPUs, RAM, HDDs) While I/O is potentially a bottleneck (due to Galaxy copying and writing the datasets etc.), I wonder if in practice this is the case. Many of the NGS tasks are so long running that I/O issues may not be a significant hit. However, you may have a potential bottleneck in getting data onto the system. How does information get from the sequencer into the Galaxy instance? This may need some thinking about. 2. What is the expected relation of integer-based and floating point based calculations, which will be loading the CPU cores? I have no idea what this means. 3. Regarding the architectural differences (strengths, weaknesses): Would an AMD- or an Intel-System be more suitable? I don't think this will make any difference. If it's a question of the number of cores, that depends to some extent on how many concurrent users or tasks you'll have running. I suspect your number of concurrent users will be low (i.e. at any time, very few people and logged in and running stuff under Galaxy). 6. HDD access (R/W) is mainly in bigger blocks instead of masses of short operations - correct? That's my impression. ------ Paul Agapow (paul-michael.agapow(a)hpa.org.uk) Bioinformatics, Health Protection Agency (UK) ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk **************************************************************************

1 0

Set metadata from within the tool
by roberto.dilauro＠uniparthenope.it 14 Aug '12

14 Aug '12

Hi everyone, How can I set dataset's metadata from within the tool? For example: From the command tag (in the xml tool specification) i call myscript.sh: <command> myscript.sh -input=$input -output=$output </command> My script.sh produce a value that i want to store as metadata of $input. I've checked the tool config syntax http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax and seems that i can read metadata (${input.metadata.somemetadata}) but not write. Maybe <code> tag can help me but it is deprecated. Thanks for the help, Roberto

2 1

Old tool not showing up properly in Tool Shed
by Peter Cock 13 Aug '12

13 Aug '12

Hi all, I've just uploaded an update to my (previously protein only) sequence analysis bundle on the ToolShed, which is stuck with the original name: tmhmm_and_signalp This now offers Galaxy wrappers for several standalone command line sequence analysis tools: * SignalP 3.0 predicts signal peptides, see: http://www.cbs.dtu.dk/services/SignalP/ * TMHMM 2.0 predicts transmembrane helices, see: http://www.cbs.dtu.dk/services/TMHMM/ * Promoter 2.0 predicts eukaryotic PolII promoters [NEW WRAPPER], see: http://www.cbs.dtu.dk/services/Promoter/ * WoLF PSORT predicts Eukaryote protein sub-cellular localization, see: http://wolfpsort.org/ * Oomycete RXLR motif prediction - a Python script implementing three published methods, which calls SignalP 3.0 and HMMER 2.3.2 internally. In this update I added the new Promoter 2.0 wrapper, and made some small changes to the SignalP and TMHMM wrappers. The other tools are unchanged. Looking at the tool bundle via the Tool Shed, for revision 7:9b45a8743100 it only shows four "valid" tools: * Promoter 2.0 - Find eukaryotic PolII promoters in DNA sequences * RXLR Motifs - Find RXLR Effectors of Plant Pathogenic Oomycetes * SignalP 3.0 - Find signal peptides in protein sequences * TMHMM 2.0 - Find transmembrane domains in protein sequences Why is this missing?: * WoLF PSORT - Eukaryote protein subcellular localization prediction The missing tool is NOT showing up under "Browse my invalid tools", and the wrapper XML seems to still be there if I browse the tip files. Thanks, Peter

2 4

Re: [galaxy-dev] FastQC Tool Errors
by Robert Chase 13 Aug '12

13 Aug '12

Hello, We are having trouble running FastQC on our production server. The original stacktrace looked like this; Fastq failed. Error executing FastQC. Unknown option: djava.awt.headless Exception in thread "main" java.lang.NoClassDefFoundError: uk/ac/bbsrc/babraham/FastQC/FastQCApplication Caused by: java.lang.ClassNotFoundException: uk.ac.bbsrc.babraham.FastQC.FastQCApplication at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) Could not find the main class: uk.ac.bbsrc.babraham.FastQC.FastQCApplication. Program will exit. Then I tried to update java to the most current version. The current version is java -version java version "1.7.0_05" Java(TM) SE Runtime Environment (build 1.7.0_05-b06) Java HotSpot(TM) 64-Bit Server VM (build 23.1-b03, mixed mode) This changed the error message we got when we ran FastQC. The new trace looks like this; Fastq failed. Error executing FastQC. Unknown option: djava.awt.headless Error: Could not find or load main class uk.ac.bbsrc.babraham.FastQC.FastQCApplication The strange thing is that the tools works on our test server, which is almost an exact copy of the production server. -Rob __________________________ Sure enough, this fixed the problem. I just installed the latest Sun JRE and it is working now. Thanks for the suggestion! I would have never guessed. -Josh On Mon, Jun 25, 2012 at 11:34 AM, simon andrews <[hidden email]<http://dev.list.galaxyproject.org/user/SendEmail.jtp?type=node&node=4655462…> > wrote: Yes, that's the broken version of gcj. I don't have a Centos machine here at the moment, but I think if you install OpenJDK and use the alternatives system to select that as the default JRE then that should fix things. Simon. On 25 Jun 2012, at 17:19, Josh Nielsen wrote: Hi Simon, I recently installed Java with the yum package manager on our compute nodes, and our cluster is a Centos 6 environment. Here is what the results of "java -version" returned on the compute nodes: "*bash# java -version* *java version "1.5.0"* *gij (GNU libgcj) version 4.4.4 20100726 (Red Hat 4.4.4-13)* * * *Copyright (C) 2007 Free Software Foundation, Inc.* *This is free software; see the source for copying conditions. There is NO* *warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE* ." Is this version of Java too old? Perhaps I need to install the JRE manually? Thanks! On *Sat Jun 23 04:20:53 EDT 2012*, Simon Andrews <[hidden email]<http://dev.list.galaxyproject.org/user/SendEmail.jtp?type=node&node=4655462…> wrote: > Are you by any chance running an older version of gcj as your java version? There is a known bug in some of these where they don't correctly configure the headless environment, even if the correct parameters are passed. This causes exactly the kind of errors you're seeing. > > If this is the case you'll need to install a more recent JRE (or update your path to point to one which is already present). > > Simon. > > > On Sat, Jun 23, 2012 at 6:30 AM, Josh Nielsen <[hidden email]<http://dev.list.galaxyproject.org/user/SendEmail.jtp?type=node&node=4655462…>> > wrote: > > Hello, > > > > I am having an issue with getting the FastQC tool to work with Galaxy on > our > > server. I downloaded the FastQC files (version 0.8.0) and changed the > > directory that the wrapper script looks for the 'fastqc' executable in, > but > > when we run a job with it we have been getting the following output: > > > > "Started analysis of Clip > > > > Approx 5% complete for Clip > > Approx 10% complete for Clip > > ... > > ... > > Approx 95% complete for Clip > > Approx 100% complete for Clip > > > > Analysis complete for Clip > > > > (.:9754): Gtk-WARNING **: cannot open display: " > > > > And then the job shows as failed in Galaxy. The output .dat file just has > > that same output/error message in it (though it seems to indicate it got > to > > 100%). Also when I try to execute the fastqc file directly (albeit with > no > > arguments) I get this: > > > > "Exception in thread "main" java.awt.HeadlessException: > > No X11 DISPLAY variable was set, but this program performed an operation > > which requires it. > > at > > java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:173) > > at java.awt.Window.<init>(Window.java:437) > > at java.awt.Frame.<init>(Frame.java:419) > > at java.awt.Frame.<init>(Frame.java:384) > > at javax.swing.JFrame.<init>(JFrame.java:174) > > at > > > uk.ac.bbsrc.babraham.FastQC.FastQCApplication.<init>(FastQCApplication.java:271) > > at > > > uk.ac.bbsrc.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:102)" > > > > Both errors seem to have something to do with the graphical GUI > component of > > FastQC (which I have seen some screenshots for on the FastQC webpage). If > > this application is GUI-driven how did the online PSU Galaxy get it to > work > > with their wrapper script when the tools are run in a command-line > > environment with no X11 or Gtk? Essentially I'm just wondering what steps > > I'm missing here to getting this to work with our Galaxy mirror, other > than > > just dropping the executable in place? Any suggestions? > > > > Thanks, > > Josh > > > > > > ___________________________________________________________ > > Please keep all replies on the list by using "reply all" > > in your mail client. To manage your subscriptions to this > > and other Galaxy lists, please use the interface at: > > > > http://lists.bx.psu.edu/ > > > > -- > Ross Lazarus MBBS MPH; > Associate Professor, Harvard Medical School; > Head, Medical Bioinformatics, BakerIDI; Tel: <a > href="tel:%2B61%20385321444" value="+61385321444" target="_blank"> +61 > 385321444; > The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT *Registered Charity No. 1053902.* The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/email_disclaimer.html>

1 0

Deploy a tool from the Galaxy toolshed to my local version of Galaxy
by Edward Turk 13 Aug '12

13 Aug '12

Hello, I have a local instance, I am the only user, and I would like to use a tool that is in the galaxy tool shed but not in my local instance. After reading the toolshed wiki (http://wiki.g2.bx.psu.edu/Tool%20Shed) I have a few questions. 1. Do I need a local tool shed? I need to use an available tool but do not need to develop tools 2. According to this part of the wiki (http://wiki.g2.bx.psu.edu/Tool%20Shed#Automatic_installation_of_Galaxy_tool…) I need to be logged in as "Admin". How do I make myself an Admin? Thank you, Edward

2 1