June 2011 - galaxy-dev - lists.galaxyproject.org

output data filter
by Bram Slabbinck 30 Jun '11

30 Jun '11

Hi I'm having an internal data source that is connected to Galaxy (version: second last distribution) through the data connection protocol. I have some parameters returned from my data source website which are parsed with a parameter request translation. Everything works fine except for establishing variable static output data (https://bitbucket.org/galaxy/galaxy-central/wiki/ToolsMultipleOutput) I want to label my output data set according to a parameter value that is returned from my data source website. How can I use this parameter as a filter because the following statement does not work: <request_param_translation> <request_param galaxy_name="dummy" remote_name="output" missing="test" /> <outputs> <data name="output" label="labelX"> <filter>$dummy == "valueX" </filter> </data> <data name="output" label="labelY"> <filter>$dummy != "valueX" </filter> </data> </outputs> I have also tried without the dollar character (dummy = "valueX") as stated in the link above but this gives a TypeError in the log. Using $dummy.value==valueX does not work either. If I print the Galaxy variables by Cheetah code ( #silent sys.stderr.write(" searchList = '%s'\n" % (str($searchList))) ), the parameter dummy is created. So, how can I use this type of parameter as a filter? thx for any suggestion regards Bram

1 0

Re: [galaxy-dev] Fail to start galaxy
by Hans-Rudolf Hotz 30 Jun '11

30 Jun '11

Hi Colin Please keep all replies on the list by using "reply all" You are aware of the 'download' button (ie the disk icon), aren't-you? Unless, you have a log-in requiorement for your galaxy server, you can also use this url to copy the data to a different server with 'wget'. Or you try to find out the dataset number (eg by setting up 'reports' or just by looking at the filesystem) and you do an 'scp' to your other server Regards, Hans On 06/29/2011 11:58 PM, colin molter wrote: > Hi Hans, > just one more small galaxy question. > I finished the whole process of transforming the data using galaxy. Now i > would like to get these data into my server's filesystem. Something exactly > opposite to the adding of datafile to a datalibrary would be ideal. > (so getting a dataset from the history to the filesystem). > > thx for your time, > regards, > colin > > 2011/6/29 colin molter<colin.molter(a)gmail.com> > >> it finished! and results seems perfect. >> it is just very very slow: nearly 10 hours for 25Gb. >> >> it seems that in future version of illumina preprocessed data, there will >> be no longer need of the groomer. >> thx >> colin. >> >> >> 2011/6/29 Hans-Rudolf Hotz<hrh(a)fmi.ch> >> >>> >>> >>> On 06/28/2011 07:06 PM, colin molter wrote: >>> >>>> hi hans, >>>> you are right. >>>> it seems that even if i stopped galaxy >>>> (i just stopped the run.sh command) >>>> >>>> i still have a job running: >>>> root 10441 97.0 0.0 77496 1772 pts/6 R 11:59 402:48 python >>>> /opt/galaxy-central/tools/**fastq/fastq_groomer.py >>>> /fs1/GenomicsData/ERP0005 >>>> >>>> it seems that the transformation of a fastq file is taking a lot of time >>>> (it >>>> uses 100%of cpu since 7hours). >>>> does it sound normal? >>>> >>>> >>> I am not very familiar with 'fastq_groomer.py' and without knowing the >>> size of your fasta file, it is impossible to tell. >>> >>> Have you tried a subset of your fasta file? May be your data is corrupt >>> resulting in an endless loop? >>> >>> >>> Hans >>> >>> >>> >>> i think i have better to kill that job and to try another way. >>>> >>>> i stopped the job and it works >>>> thx. >>>> >>>> >>>> 2011/6/28 Hans-Rudolf Hotz<hrh(a)fmi.ch> >>>> >>>> Hi Colin >>>>> >>>>> >>>>> I launched a big job on my local galaxy server (for which I am admin). >>>>> The >>>>> >>>>>> job was to put local dir in the datalist. >>>>>> It took too long and i wanted to stop it .... but how?? i failed to >>>>>> find a >>>>>> solution. >>>>>> >>>>>> >>>>> go to the admin view -> 'Manage jobs' and there you can kill individual >>>>> jobs. >>>>> >>>>> >>>>> To stop the job i thought that stopping the galaxy running instance >>>>> could >>>>> >>>>>> make it. (AM I RIGHT?) >>>>>> >>>>>> >>>>> possible, but sounds like an 'overkill' to me. I would first try the >>>>> 'soft' >>>>> method described above....and use this only as a last resort. >>>>> >>>>> >>>>> Unfortunately, i failed to restart galaxy. I have the following error: >>>>> >>>>>> >>>>>> >>>>>> are you sure you have killed the galaxy process? Since this part of >>>>> the >>>>> error message tells me that Galaxy (or another service) is still using >>>>> the >>>>> port number. >>>>> >>>>> >>>>> socket.error: (98, 'Address already in use') >>>>>> >>>>> >>>>> >>>>> How did you stop galaxy? >>>>> >>>>> >>>>> >>>>> Regards, Hans >>>>> >>>>> >>>> >>>> >>>> >> >> >> -- >> Colin Molter >> University of Brussels - InSilico Team - http://insilico.ulb.ac.be/ >> >> > >

2 1

Re: [galaxy-dev] Refreshing/Reloading Files
by Hans-Rudolf Hotz 30 Jun '11

30 Jun '11

Hi Paul Please keep all replies on the list by using "reply all" A simple solution, which we use quite a lot, is the following: we use the "dynamic_options" attribute, eg: <inputs> <param name="foo" type="select" label="what" help="Use tickboxes to select " display="radio" dynamic_options="ds_fooOptions()"/> </inputs> <outputs> <data format="fasta" name="output" label="more foo" /> </outputs> <code file="extra_code_for_foo_list.py" /> <help> </help> </tool> and then we have a little python script ("extra_code_for_foo_list.py") with the "ds_fooOptions" function, which can read your file (ie your list of databases), eg def ds_fooOptions(): """List available foos as tuples of (displayName,value)""" foos = <whatever python code is required to generate the tuples> return foos I hope this helps, Hans On 06/29/2011 08:09 PM, Admins de Galaxy wrote: > Hi Hans, > > yes that's it. We are offering the list of databases as options to select in > the GUI, > before executing the script which compares the selected database with the > sequence. > > Paul > > 2011/6/29 Hans-Rudolf Hotz<hrh(a)fmi.ch> > >> Hi Paul >> >> You probably need to be a bit more specific...at what stage is this '.txt >> file' read (or rather should be read)? - are you offering the (growing) list >> of databases as options to select in the GUI? >> >> Hans >> >> >> >> On 06/29/2011 10:20 AM, Admins de Galaxy wrote: >> >>> Hello everyone, >>> we have a problem with one of our selfwritten tools. >>> We have a tool, that compares sequence with a database. >>> The List of the available databases is loaded from a .txt file. >>> >>> One of our other tools, manages that a new database is >>> added to the .txt file. But Galaxy doesn't recognize the change. >>> >>> It would be nice if someone could give us an advice. >>> >>> Best regards >>> >>> Paul K. Deuster >>> @ Technische Hochschule Mittelhessen >>> >>> >>> >>> >>> ______________________________**_____________________________ >>> Please keep all replies on the list by using "reply all" >>> in your mail client. To manage your subscriptions to this >>> and other Galaxy lists, please use the interface at: >>> >>> http://lists.bx.psu.edu/ >>> >> >

1 0

Re: [galaxy-dev] How to set java heap size in Galaxy?
by Roman Valls 30 Jun '11

30 Jun '11

Hello Liu, I'm keeping our mail thread in the mailing list, as advised by its guidelines on the footer of each mail. Now, galaxy runs on top of paster, a minimal python webserver, but it has nothing to do with Java, looks like your out of memory error comes from your app then. Have you tried the suggested settings from Marco ? Namely: java -Xmx512m I bet there's an environment variable or config setting for Tomcat too if you're unsure about where your jar's are called. Hope that helps ! Roman On 2011-06-29 12:02, liu bo wrote: > Thank you, Roman. > Our app invoke a workflow server. > I guess Galaxy has a self container like Tomcat, that is why we can > access it via 127.0.0.1 from browser. > I know in Tomcat, there is a file "catalina.sh" for setting JAVA_OPTS. > Is there a similiar file in Galaxy to set JAVA_OPTS? > Thank you very much. > > Regards, > Bo > > On Tue, Jun 28, 2011 at 4:18 AM, Roman Valls <brainstorm(a)nopcode.org> wrote: >> Hi Liu, >> >> Does your app execute Picard/GATK at some point ? In that case those >> would be the ones triggering your Java OOM error, in that case Bo's >> suggestion is the way to go. That's my best guess since Galaxy itself >> doesn't use java either (only python AFAIK). >> >> Regards, >> Roman >> >> On 2011-06-27 18:44, liu bo wrote: >>> Hi Marco, >>> >>> Thanks for your kind reply. >>> My app is a python file, and there's not an explicit command to run java. >>> I just wonder whether there is a file to configure Galaxy, in order to >>> set the Java heap size. >>> Thank you. >>> >>> Best wishes, >>> Bo >>> >>> On Mon, Jun 27, 2011 at 7:51 PM, Marco Moretto <marco.moretto(a)gmail.com> wrote: >>>> Hi Liu, >>>> if you set up your application in the XML to run with Java, like java -jar >>>> yourapplication.jar, it is sufficient to add the -Xmx parameter. Something >>>> like >>>> java -Xmx512m -jar yourapplication.jar >>>> Greets >>>> --- >>>> Marco >>>> >>>> >>>> On 27 June 2011 12:15, liu bo <liubo03(a)gmail.com> wrote: >>>>> >>>>> Dear all, >>>>> >>>>> I have added an application to Galaxy's Tools. >>>>> Now there occurs an error "java.lang.OutOfMemoryError: Java heap space". >>>>> Do you know how to set java heap size in Galaxy? >>>>> Thank you very much. >>>>> >>>>> Best regards, >>>>> Bo >>>>> ___________________________________________________________ >>>>> Please keep all replies on the list by using "reply all" >>>>> in your mail client. To manage your subscriptions to this >>>>> and other Galaxy lists, please use the interface at: >>>>> >>>>> http://lists.bx.psu.edu/ >>>> >>>> >>> >>> ___________________________________________________________ >>> Please keep all replies on the list by using "reply all" >>> in your mail client. To manage your subscriptions to this >>> and other Galaxy lists, please use the interface at: >>> >>> http://lists.bx.psu.edu/ >> ___________________________________________________________ >> Please keep all replies on the list by using "reply all" >> in your mail client. To manage your subscriptions to this >> and other Galaxy lists, please use the interface at: >> >> http://lists.bx.psu.edu/ >>

1 0

Re: [galaxy-dev] Question about installing NCBI BLAST+ onto Galaxy
by Peter Cock 30 Jun '11

30 Jun '11

On Thu, Jun 30, 2011 at 1:26 AM, George Yianni Michopoulos wrote: > Hey Peter, > > Thanks so much for your help, your comments helped me realize what > was wrong. The issue was that I wasn't including the name of the > database within the path, just the folder it was in! > > Best, > George Michopoulos > Easily done, I'm glad its working now. Peter

1 0

Question about installing NCBI BLAST+ onto Galaxy
by George Michopoulos 29 Jun '11

29 Jun '11

Hey everyone, Hope all is well! I was wondering if someone could help me with another error I ran into. I recently downloaded the NCBI BLAST+ toolkit and it automatically installed itself into Galaxy. I'm just wondering if anyone knows where I am supposed to put the directory with the database files it needs to run correctly, or if it makes a difference. I have configured the blastdb.loc file as shown below, and the database now appears in the drop-down menu for the NCBI tools, but when I try executing any of the BLASTs Galaxy returns the following error, regardless of the path permutation I try: When I tried putting the databases within the galaxy installation (within galaxy_test) and I used the whole path: BLAST Database error: No alias or index file found for nucleotide database [/Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/ tool-data/blastdb/refseq_rna] in search path [/Users/burtonigenomics/ Rosa_Files/bin/fastx_bin/galaxy_test/database/job_working_directory/ 28::] Return error code 2 from command: tblastx -query /Users/ burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/database/files/ 000/dataset_27.dat -db "/Users/burtonigenomics/Rosa_Files/bin/ fastx_bin/galaxy_test/tool-data/blastdb/refseq_rna" -evalue 0.001 - out /Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/ database/files/000/dataset_32.dat -outfmt 6 -num_threads 8 When I tried putting the databases within the galaxy installation (within galaxy_test) and I used the path from the galaxy root: BLAST Database error: No alias or index file found for nucleotide database [/blastdb/refseq_rna] in search path` [/Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/database/ job_working_directory/24::] Return error code 2 from command: blastn - query /Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/ database/files/000/dataset_27.dat -db /blastdb/refseq_rna -task megablast -evalue 0.001 -out /Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/database/ files/000/dataset_28.dat -outfmt 6 -num_threads 8 When I tried putting the databases outside of the galaxy installation: BLAST Database error: No alias or index file found for nucleotide database [/Users/burtonigenomics/Rosa_Files/data/BLAST_databases/ refseq_rna] in search path [/Users/burtonigenomics/Rosa_Files/bin/ fastx_bin/galaxy_test/database/job_working_directory/29::] Return error code 2 from command: tblastx -query /Users/burtonigenomics/Rosa_Files/bin/fastx_bin/ galaxy_test/database/files/000/dataset_27.dat -db "/Users/ burtonigenomics/Rosa_Files/data/BLAST_databases/refseq_rna " -evalue 0.001 -out /Users/burtonigenomics/Rosa_Files/bin/fastx_bin/galaxy_test/ database/files/000/dataset_33.dat -outfmt 6 -num_threads 8 ________________________________________________________________________________________________________________________ The index file is in the folder specified in the nucleotide database [/blastdb/refseq_rna], but I don't know why/if the search path is actually going to the right place. job_working_directory doesn't seem to have any files, is it something that gets changed when galaxy starts running? Do you know why it's looking there? _______________________________________________________________________________________________________________________ My current blastdb.loc file (which is supposed to point out the path) currently reads: #This is a sample file distributed with Galaxy that is used to define a #list of nucleotide BLAST databases, using three columns tab separated #(longer whitespace are TAB characters): # #<unique_id> <database_caption> <base_name_path> # #The captions typically contain spaces and might end with the build date. #It is important that the actual database name does not have a space in it, #and that the first tab that appears in the line is right before the path. # #So, for example, if your database is nt and the path to your base name #is /depot/data2/galaxy/blastdb/nt/nt.chunk, then the blastdb.loc entry #would look like this: # #nt_02_Dec_2009 nt 02 Dec 2009 /depot/data2/galaxy/blastdb/nt/nt.chunk # #and your /depot/data2/galaxy/blastdb/nt directory would contain all of #your "base names" (e.g.): # #-rw-r--r-- 1 wychung galaxy 23437408 2008-04-09 11:26 nt.chunk.00.nhr #-rw-r--r-- 1 wychung galaxy 3689920 2008-04-09 11:26 nt.chunk.00.nin #-rw-r--r-- 1 wychung galaxy 251215198 2008-04-09 11:26 nt.chunk.00.nsq #...etc... # #Your blastdb.loc file should include an entry per line for each "base name" #you have stored. For example: # #nt_02_Dec_2009 nt 02 Dec 2009 /depot/data2/galaxy/blastdb/nt/nt.chunk #wgs_30_Nov_2009 wgs 30 Nov 2009 /depot/data2/galaxy/blastdb/wgs/wgs.chunk #test_20_Sep_2008 test 20 Sep 2008 /depot/data2/galaxy/blastdb/test/test #...etc... # #See also blastdb_p.loc which is for any protein BLAST database. # #Note that for backwards compatibility with workflows, the unique ID of #an entry must be the path that was in the original loc file, because that #is the value stored in the workflow for that parameter. # refseq_rna Reference RNA Sequence /Users/burtonigenomics/Rosa_Files/ bin/fastx_bin/galaxy_test/tool-data/blastdb/refseq_rna # I've also tried: # refseq_rna Reference RNA Sequence /tool-data/blastdb/refseq_rna # and refseq_rna Reference RNA Sequence /blastdb/refseq_rna # but it still seems to sending to a default thats not working or something? _______________________________________________________________________________________________________ I would really appreciate help with this, if there's anyone who is more knowledgeable about the BLAST tools and Galaxy at the NCBI that I should talk to let me know, but I feel like this might be something you could help me with. Let me know if there's any other information you would need, and thanks for your time! Best, George Michopoulos Fernald Lab Stanford University

2 1

changing file storage location
by Sanka, Ravi 29 Jun '11

29 Jun '11

Greetings, My name is Ravi Sanka. I have a question regarding Galaxy. Recently, I had a local installation set up and am now trying to change the location where the install stores imported and created files. I opened the config file universe_wsgi.ini and did the following: - Removed the '#' mark from '#file_path = database/files' - Changed value of file_path to the absolute path of an accessible, readable/writable location - Saved changes Despite this, the install still stores files in database/files. Is there a step I'm missing? Does the setup procedure (setup.sh & run.sh) need to be run again? I also intend to change the database from the SQLite default to an existing database, so I assume the steps to change the file_path also apply to database_connection. Thank you for your time. ----------------------------------------------- Ravi Sanka ICS - Bioinformatics Engineer J. Craig Venter Institute 301-795-7743 -----------------------------------------------

2 7

bwa failure preparing job
by Branden Timm 29 Jun '11

29 Jun '11

Hi All, I'm having issues running BWA for Illumina with the latest version of Galaxy (5433:c1aeb2f33b4a). It seems that the error is a python list error while preparing the job: Traceback (most recent call last): File "/home/galaxy/galaxy-central/lib/galaxy/jobs/runners/local.py", line 58, in run_job job_wrapper.prepare() File "/home/galaxy/galaxy-central/lib/galaxy/jobs/__init__.py", line 371, in prepare self.command_line = self.tool.build_command_line( param_dict ) File "/home/galaxy/galaxy-central/lib/galaxy/tools/__init__.py", line 1575, in build_command_line command_line = fill_template( self.command, context=param_dict ) File "/home/galaxy/galaxy-central/lib/galaxy/util/template.py", line 9, in fill_template return str( Template( source=template_text, searchList=[context] ) ) File "/home/galaxy/galaxy-central/eggs/Cheetah-2.2.2-py2.6-linux-x86_64-ucs4.egg/Cheetah/Template.py", line 1004, in __str__ return getattr(self, mainMethName)() File "DynamicallyCompiledCheetahTemplate.py", line 106, in respond IndexError: list index out of range I checked the bwa_index.loc file for errors, it seems that the line for the reference genome I'm trying to map against is correct (all whitespace is tab characters): synpcc7002 synpcc7002 Synechococcus /home/galaxy/galaxy-central/bwa_ indices/SYNPCC7002 I'm not sure what the next troubleshooting step is, any ideas? -- Branden Timm btimm(a)glbrc.wisc.edu

3 5

assembly statistics dependencies
by Yann Surget-Groba 29 Jun '11

29 Jun '11

Dear all, I'm trying to use the tool assembly statistics in the 'PacBio/Illumina Assembly' section but I have the following error message: /Traceback (most recent call last): File "/home/galaxy/galaxy_dist/tools/ilmn_pacbio/assembly_stats.py", line 7, in <module> from pbpy.io.FastaIO import FastaEntry, SimpleFastaReader ImportError: No module named pbpy.io.FastaIO / Can someone tell me where to find the module pbpy, I couldn't find this information in the Galaxy wiki. Thanks in advance/, /Yann/ /

2 1

Re: [galaxy-dev] User creation with upstream authentication
by Nate Coraor 29 Jun '11

29 Jun '11

Steve Thorn wrote: > Hi Nate > > Thanks for the reply. Do you have any pointers to how we can achieve authorization in Apache (not my area of experience)? Perhaps you know of other Galaxy groups who do this sort of thing? Hi Steve, Instead of 'Require valid-user' in your Apache config, you can use either of: Require user [userid...] or Require group [group-name...] Anyone not listed in the Require directive would be shown a 403 error by Apache, which you can customize to contain any information necessary to direct users how to get access (contacting you). --nate > > Many thanks > Steve > > > On 28 Jun 2011, at 19:31, Nate Coraor wrote: > > > Steve Thorn wrote: > >> Hello > >> > >> We would like to force users to register even when they successfully > >> pass through our University's single sign-on service (Apache + > >> cosign). > >> > >> We have: > >> > >> use_remote_user = True > >> allow_user_creation = False > >> > >> in the universe_wsgi.ini, but it appears that use_remote_user takes > >> precedence over allow_user_creation. > >> > >> Ideally, we'd like users who get through the single sign-on to be > >> presented with a message like "to use galaxy please register by > >> emailing someone(a)ed.ac.uk". Is this possible? > > > > Hi Steve, > > > > This is not really possible in Galaxy without some hacking since as you > > have discovered, remote_user takes precedence over all of the built-in > > user controls. You can implement authorization in Apache, though, as a > > workaround. > > > > --nate > > > >> > >> Many thanks, > >> Steve > >> -- > >> Steve Thorn > >> Research Systems Consultant - ECDF Middleware Team > >> +44 (0)131 650 4941 > >> University of Edinburgh, JCMB, King's Buildings > >> Edinburgh EH9 3JZ, UK > >> > >> The University of Edinburgh is a charitable body, registered in > >> Scotland, with registration number SC005336. > >> > >> ___________________________________________________________ > >> Please keep all replies on the list by using "reply all" > >> in your mail client. To manage your subscriptions to this > >> and other Galaxy lists, please use the interface at: > >> > >> http://lists.bx.psu.edu/ > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > >

1 0