March 2012 - galaxy-dev - lists.galaxyproject.org

Getting meta data from input files
by Frank Sørensen 15 Mar '12

15 Mar '12

Hi All, We would like to use the new GATK modules in our DNA pipeline, so I have tried to run the tools from the "Analyse data" menu After setting up the appropriate tables in gatk_sorted_picard_index.loc, I made it run as expected. However when I tried to run it from a workflow, things didn't turn out so well - in fact it couldn't run at all. I know that the GATK tools are still in beta, so I looked into the tool xml-wrappers and fixed the error after some poking around, but in the process of debugging the wrapper, I realized one general thing that I found quite odd. It's about meta data in Galaxy's input data file representations. When a tool needs a reference genome (i.e. some mapping tool, or one of the bam analysis tools), there are always only two options when it comes to the source of the reference genome: 1: Get the reference data from history or 2: use built in (sometime referred to as cached). In any case the user has to select the reference genome before running the tool. This is fine, but what happens if two or more of these tools are called in the same workflow? Well - Then the workflow designer can 1: choose a design that allows the workflow to work only on one genome by selecting the "built-in" option and select the proper genome, and then define one workflow pr. genome - i.e by cloning the workflow and change the parameters for each tool, or 2: Set the state of all genome selection fields to "set at runtime". This implies of course, that the user running the workflow must go through all genome selection fields on all tools, and select the proper genome before the workflow is executed, OR 3:Set all genome selections to history, and specify a common workflow input for the genome reference reference input for all tools in the workflow - this approach could be problematic though, since not all tools use the same reference file format. Clearly none of these methods are ideal, when working with data form several genomes. So there is a fourth option that I miss in the current Galaxy tool implementations (actually I thought that this was what "cached" meant, until I looked at the xml-files). Namely the ability to get the genome reference file at runtime from the input data file meta data. This implies that the tools should have an extra reference source selection option: "From input meta data". This would allow workflow designers to forget all about reference data, since the tools automatically will pick the appropriate reference genome from the input file's meta data. In fact this is not so difficult to implement with the operations that are currently available in the wrapper xml / command language. In the <inputs> section of the tool XML-file, the genome reference tags could look like this (the example is from the fixed GATK "Count Covariates on BAM files" tool XML file "count_covariates.xml"): . . . <param name="input_bam" type="data" format="bam" label="BAM file"> <validator type="unspecified_build" /> <validator type="dataset_metadata_in_data_table" table_name="gatk_picard_indexes" metadata_name="dbkey" metadata_column="2" message="Sequences are not currently available for the specified build." /> </param> <conditional name="reference_source"> <param name="reference_source_selector" type="select" label="Choose the source for the reference list"> <option value="meta_data">From input file meta data</option> <option value="internal">Internal reference</option> <option value="history">History</option> </param> <when value="internal"> <param name="ref_file" type="select" label="Select a reference genome"> <options from_data_table="gatk_picard_indexes"> <filter type="sort_by" column="2" /> <validator type="no_options" message="No indexes are available" /> </options> </param> </when> <when value="history"> <param name="ref_file" type="data" format="fasta" label="Using reference file" /> </when> </conditional> . . . Note that there is no genome selection box on the GUI, if the 'From input meta data' option is selected, since it wouldn't make much sense. Then in the command section it could read something like this: . . . #if $reference_source.reference_source_selector == "internal": -R "${reference_source.ref_file.fields.path}" #end if #if $reference_source.reference_source_selector == "meta_data": -R "${ filter( lambda x: str( x[1] ) == str( $input_bam.metadata.dbkey ), $__app__.tool_data_tables['gatk_picard_indexes'].get_fields() )[0][3] }" #end if #if str( $reference_source.reference_source_selector ) == "history": -d "-R" "${reference_source.ref_file}" "${reference_source.ref_file.ext}" "gatk_input" #end if . . . When "From input meta data" is selected as the reference source, the second if-statement above performs a lookup at run-time, to check the state of the meta-data and retrieve the file path to the corresponding genome reference data. This will always work, since another tool can get the filepath from it's own axillary reference table, in this case the table is called 'gatk_picard_indexes', but other tools might use other reference files. This works fine in the GATK tools, and could be standard in other tools as well. The meta data is already there, I just don't see it put to any use anywhere, except for informational purposes - which, in my humble opinion, is a pity. The only problem I see, is that it could be tricky to display a meaningful message, when the input data doesn't contain the necessary meta data. Then it is left to the underlying tool to shout out about the error. An elegant solution to this error-problem could be an extension to the command scripting language, or perhaps someone has an other idea? Hope this information was useful to some of you Galaxy tool nerds out there :-) Kind regards, and thanks for a great framework - Frank -- Frank Sørensen, B.Sc., Programmer Molecular Diagnostic Laboratory (MDL) Molekylær Medicinsk Afdeling (MOMA) Århus Universitetshospital Skejby, Brendstrupgårdsvej, 8200 Århus N Tlf. +45 7845 5363

1 0

Re: [galaxy-dev] Galaxy: enable uploading via FTP
by Nate Coraor 14 Mar '12

14 Mar '12

On Mar 12, 2012, at 5:15 PM, jiechenable1987(a)gmail.com wrote: > Dear Nate, > > Sorry to trouble you. I know that you are busy, but this problem has annoyed me for couple days. Please help. > > I want to enable uploading via ftp for my local Galaxy instance. I followed the instructions under the wiki page strictly. However it doesn't work. > > What i did: > 1) set the directive "ftp_upload_dir = /home/jjc25/central/galaxy-central/database/ftp/" in the universe_wsgi.ini > 2)set the directive "ftp_upload_site = 127.0.0.1" > 3)created the database user galaxyftp and granted select access to it > 4)copied and pasted the proftpd configuration file on the page (http://wiki.g2.bx.psu.edu/Admin/Config/Upload%20via%20FTP) and only modified the directive "SQLConnectInfo galaxydb(a)dbserver.example.org dbuser dbpassword" to my own settings, which is "SQLConnectInfo galaxydb@localhost galaxyftp mypassword" > 6)restart proftpd by "sudo service proftpd restart" > -> here comes the error: "Fatal: unknown configuration directive 'SQLPasswordEngine' on line 43 of '/etc/proftpd/proftpd.conf' > > > I can't restart the ftp server because some unknown reasons. Can you help me out please. A lot of thanks. > > By the way, i am using the copy of Galaxy with id : changeset: 6818:48b64ce958b4 > > Look forward to your reply. > > Thanks, > JIE CHEN Hi Jie, It looks like your ProFTPd server does not include the mod_sql_passwd module. We compile ours by hand, so I can't tell you whether or not any of the prepackaged versions for Linux have a way to install a precompiled version of that module. I do see that there is not a separate package for proftpd-mod-sqlpasswd (or similar) in Debian, but that you might be able to build it if you install proftpd-dev. I've updated the wiki page to include information about how we compile ProFTPd. Please send questions to the mailing list rather than directly to individual people on the team. There are others who may be able to answer your question in a timelier manner (and the community as a whole can benefit from the public response). --nate

1 0

Re: [galaxy-dev] Error while uploading
by Dorset, Daniel C 14 Mar '12

14 Mar '12

Hi Luciano, I had a similar problem with our local instance of Galaxy. In our case, the /tmp directory was filling up, causing the job to abort. We had Galaxy running on a partition with tons of disk space, but the partition on which /tmp was located had substantially less free disk space. Try running the "df" command from the Linux command line. If you don't know how to interpret its output, let me know. Certain tasks that run on behalf of Galaxy, like gzip for decompressing large datasets, will store intermediate temporary data in /tmp instead of a subdirectory within your Galaxy installation. Let me know if that solves your issue, Dan ________________________________________ Message: 1 Date: Sat, 10 Mar 2012 17:25:38 -0600 From: Luciano Cosme <cosme.simple(a)gmail.com> To: Galaxy-dev <galaxy-dev(a)lists.bx.psu.edu> Subject: [galaxy-dev] Error while uploading Message-ID: <CAMASfsjdm0UDciVqk7P-mPijr8UaLxe0zX7SZuG4UPsQJ_NxBQ(a)mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi, I uploaded 11 files today (1 to 2Gb compressed files) to my local instance without a problem (as admin). Then I got this error message below and it get stuck at "This dataset is uploading" and I could not upload the last files. Is there anything I can do to solve it? What I notice is that I can still upload small files, before I concatenate them. I have 15 time points and some of them have up to 20 files, so it is easier to concatenate them and them upload. Thank you. Luciano

3 2

NFS, Cluster and working directories questions
by Louise-Amélie Schmitt 14 Mar '12

14 Mar '12

Hello, We're currently trying to switch to a big cluster but we have a lot of doubts and questions, especially since the I/O is a serious issue for our NFS. - We saw the outputs_to_working_directory option in the .ini file, but it only concerns output files, is there a way to make a local copy of all the input files in the job working directory? (including indices) - Can we set the job working directory as an absolute path so it's local to the nodes, like /tmp ? - If it's possible, will the job working directories created for each job be cleaned properly at the end? Thanks, L-A

2 3

HTML frames incompatible with galaxy
by Hanfei Sun 14 Mar '12

14 Mar '12

Hi, We developed a custom tool on our Galaxy site. This tools' result is a html file with two frames. (Attachment is an example) Previously I can view the result perfectly when I click on "Display data in browser" icon on that data in history panel. However, we updated with the newest galaxy-dist code yesterday and then the HTML result can not be displayed in Galaxy normally. Neither frames can be seen in Galaxy's main panel now (middle frame, the URL looks like ..datasets/xxxx/display/?preview=True) The html file with two-frames displays nicely on old Galaxy code but doesn't display at all on the latest one. Does anyone know what may cause this problem? Thanks! -- Best wishes, Hanfei Sun

1 0

Re: [galaxy-dev] [galaxy-user] How to Pass Parameter
by Carlos Borroto 14 Mar '12

14 Mar '12

On Tue, Mar 13, 2012 at 2:15 PM, La Chi <the_man.inmee(a)yahoo.com> wrote: > hi galaxy developers , i have developed this parameter in xml file and i am > passing "tl" value to a function which is present in my python file , but > when i select its type=text , it shows a text field with tl , all i want to > know how can i pass kb value without showing text field on galaxy. thanks > > <param name="site" type="text" label="site for" value="tl" help="Enter tl > "/> > Hi, I'm not quite sure I understood your question, but I wonder if type="hidden" is what your looking for: http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax#A.3Cparam.3E_t… Hope it helps, Carlos

1 0

Error starting Galaxy after update of March 2012
by Alban Lermine 14 Mar '12

14 Mar '12

Hi, I run upgrade: hg pull -u -r 40f1816d6857 on our local instance, and I'm now unable to start Galaxy.. Here is the content of Galaxy_log: Traceback (most recent call last): File "<galaxy_dir>/galaxy-dist/lib/galaxy/web/buildapp.py", line 81, in app_factory from galaxy.app import UniverseApplication File "<galaxy_dir>/galaxy-dist/lib/galaxy/app.py", line 11, in <module> from galaxy.objectstore import build_object_store_from_config File "<galaxy_dir>/galaxy-dist/lib/galaxy/objectstore/__init__.py", line 27, in <module> from galaxy.objectstore.s3_multipart_upload import multipart_upload File "<galaxy_dir>/galaxy-dist/lib/galaxy/objectstore/s3_multipart_upload.py", line 22, in <module> eggs.require('boto') File "<galaxy_dir>/galaxy-dist/lib/galaxy/eggs/__init__.py", line 415, in require raise EggNotFetchable( str( [ egg.name for egg in e.eggs ] ) ) EggNotFetchable: ['boto'] Removing PID file <galaxy_dir>/galaxy-dist/galaxy.pid Is that an error specific to our instance? Is there a way to downgrade Galaxy? Thanks by advance for your answer.. Alban -- Alban Lermine Unité 900 : Inserm - Mines ParisTech - Institut Curie « Bioinformatics and Computational Systems Biology of Cancer » 11-13 rue Pierre et Marie Curie (1er étage) - 75005 Paris - France Tel : +33 (0) 1 56 24 69 84

2 4

Memory leaks with pbs job runner
by David Matthews 14 Mar '12

14 Mar '12

Hi, We emailed previously about possible memory leaks in our installation of Galaxy here on the HPC at Bristol. We can run Galaxy just fine on our login node but when we integrate into the cluster using pbs job runner the whole thing falls over - almost certainly due to a memory leak. In essence, every attempt to submit a TopHat job (with 2x5GB paired end reads to the full human genome) always results in the whole thing falling over - but not when Galaxy is restricted to the login node. We saw that Nate responded to Todd Oakley about a week ago saying that there is a memory leak in libtorque or pbs_python when using the pbs job runner. Have there been any developments on this ? Best Wishes, David. __________________________________ Dr David A. Matthews Senior Lecturer in Virology Room E49 Department of Cellular and Molecular Medicine, School of Medical Sciences University Walk, University of Bristol Bristol. BS8 1TD U.K. Tel. +44 117 3312058 Fax. +44 117 3312091 D.A.Matthews(a)bristol.ac.uk

2 2

Authentic problem about uploading files to GenomeSpace
by Hanfei Sun 14 Mar '12

14 Mar '12

Hi, I'm trying to write a tool that can send data(upload) from Galaxy to GenomeSpace. I've developed a prototype that can be accessed by the following link: http://cistrome.org/ap/tool_runner?tool_id=send_genomespace (It locates at Galaxy Toolbox-> Get Data -> Send data to Genome Space) But the current problem is, the Galaxy server doesn't has the authentic information (user-name, password or token) of Genome Space. So the user need to submit the GenomeSpace account and password to the Galaxy server. What's more, for every time he wants to do the uploading, he need to submit the GenomeSpace password again to Galaxy because Galaxy can not hold on this information. It's quite inconvenient and unsafe to input the password again and again. (If Galaxy can bind a user's Galaxy account to his Genome Space account, this problem may be solved. But as far as I know, this is not available currently.) Does anyone know how to deal with this problem? Thanks! -- Best wishes, Hanfei Sun

2 1

How to Pass Parameter
by Jennifer Jackson 13 Mar '12

13 Mar '12

Hello La Chi, I am going to forward you question over the the galaxy-dev(a)bx.psu.edu mailing list where the development community is more likely to see your question (the galaxy-user list is generally about using Galaxy on the Main public server). http://wiki.g2.bx.psu.edu/Support#Mailing_Lists Take care, Jen Galaxy team On 3/13/12 11:15 AM, La Chi wrote: > hi galaxy developers , i have developed this parameter in xml file and i > am passing "tl" value to a function which is present in my python file , > but when i select its type=text , it shows a text field with tl , all i > want to know how can i pass kb value without showing text field on > galaxy. thanks > > <param name="site" type="text" label="site for" value="tl" help="Enter > tl "/> > > > ___________________________________________________________ > The Galaxy User list should be used for the discussion of > Galaxy analysis and other features on the public server > at usegalaxy.org. Please keep all replies on the list by > using "reply all" in your mail client. For discussion of > local Galaxy instances and the Galaxy source code, please > use the Galaxy Development list: > > http://lists.bx.psu.edu/listinfo/galaxy-dev > > To manage your subscriptions to this and other Galaxy lists, > please use the interface at: > > http://lists.bx.psu.edu/

1 0