August 2011 - galaxy-dev - lists.galaxyproject.org

Integration with LIMS
by Sonderegger Bernhard 29 Nov '11

29 Nov '11

Hello, I am exploring the possibility of using a local galaxy installation for light bioinformatics on sequence data in the scope of an existing LIMS system. Ideally I would like to be able to do the following 1. Push datasets from the LIMS to galaxy (directly into the user's account or into a fresh temporary session) 2. Allow the user to perform tasks within galaxy, backtracking and retrying as necessary. 3. Push the results along with the galaxy history (ideally converted to a workflow) back to the LIMS 4. (Allow workflows stored in the LIMS to be pushed to Galaxy along with further datasets or even completely automatic launch of these workflows from within the LIMS) Is this feasible? How much wrok would be required? Since I am new to Galaxy (at least from the development side), I would very much appreciate comments advice and pointers to documentation or existing projects of a similar nature. Thanks in advance, Bernhard

2 1

Dataset Cleanup Question
by Lance Parsons 17 Nov '11

17 Nov '11

I am running a local instance of Galaxy and I've been trying to sort out some issue with dataset cleanup. For the most part, things are working OK running the shell scripts in the recommended order: delete_userless_histories.sh purge_histories.sh purge_libraries.sh purge_folders.sh delete_datasets.sh purge_datasets.sh I have the number of days set to 10. When I look at the reports webapp however, it reports that there are "62 datasets were deleted more than 15 days ago, but have not yet been purged, disk space: 12975717335." These have stuck around now for 45 days (and counting). I have even tried running the scripts with the -f option to force galaxy to re-evaluate the datasets to no avail. Any suggestions? Thanks. -- Lance Parsons - Scientific Programmer 134 Carl C. Icahn Laboratory Lewis-Sigler Institute for Integrative Genomics Princeton University

2 2

Adding bowtie-build ONLY as its own tool
by Nikhil Joshi 10 Nov '11

10 Nov '11

Hi all, So I am trying to add a tool to our local Galaxy, for which it needs a bowtie-build indexed reference, but NOT the alignment. The alignment step is done by the tool itself (using bowtie). My question is two-fold: One, why does the bowtie tool do both indexing and mapping? It seems that you would want to have the indexing separate from the alignment so that you can index once and then align as many times as you need. Right now it seems like (correct me if I am wrong) that EVERY time you do an alignment, Galaxy will index the reference. This seems like it is adding a lot more time to the alignment step than is necessary. Is this correct, or am I missing something? Two, so if I were to make bowtie-build into its own tool, how could I access the files that it creates, i.e. since bowtie-build takes a reference and a prefix as input and then creates multiple files that use the prefix.... I guess I need to access the prefix name somehow.... any help would be highly appreciated. Thanks! - Nik.

2 1

Re: [galaxy-dev] Inquiring
by Nate Coraor 07 Nov '11

07 Nov '11

Hi Yan, I've again moved this back to the galaxy-dev list. Please reply to this list and not galaxy-user. Yan Luo wrote: > Dear Nate, > > Thanks for your quick response. I appreciate, it is very important for us to > use it. We have problem recently. > > (1) In fact, I can't find the file "paster.log". The problem is that > recently we expanded our gluster(Linux sever) and did the rebalance that has > some bugs. Some of files permission have been changed. So we can't use > User/register righr now, when we tried, we got the sever error, do you have > any idea how to fix it? (we can change the ownships/read and write > permission for some files manually, but we don't know which files and where > they are?) If you are executing run.sh without the --daemon flag, the output will go to whatever terminal window you started Galaxy in, not to a file. You'll want to make sure that all of the files under Galaxy's root directory are owned by a single user, which is the same user which starts and runs the Galaxy process. > (2) We want to reboot the Galaxy, should we first stop and start as > follows? > $ sh run.sh --stop > $ sh run.sh The full command to stop would be `sh run.sh --stop-daemon`, but that only applies if you originally started Galaxy with `run.sh --daemon`. > Is there any difference if I use "-daemon" flag? Last times, my colleague > started using "./run.sh". How to stop it? > If "$ sh run.sh --stop" doesn't work (I didn't try yet, ), how can I find > the process (Linux) that is running by Galaxy and kill it? The --daemon flag runs Galaxy in the background and redirects its output to paster.log. Without the --daemon flag, the process stays connected to the terminal in which you start it. Since Galaxy wasn't originally started with the --daemon flag, you'll need to find the process, either by locating the terminal window in which it was started (possibly by your colleague) and then hitting Ctrl-C, or by using the `ps` command (e.g. `ps auxwww | grep python`) and killing the Galaxy process. It usually looks something like: nate 18213 0.9 0.4 438844 146260 pts/7 Sl+ 10:47 2:30 python ./scripts/paster.py serve universe_wsgi.ini So you would do: % kill 18213 > (3) Could you please let me know if there is a default administrator > user/password for galaxy. I want to add an administrator user, how can I do > that? > According to the instruction, I will change a line of "universe_wsgi.ini" as > follows. How can I set/get my password? > admin_users = yan.luo(a)email.com There is no default. Any users set in admin_uesrs will be administrators. If the account specified in admin_users does not yet exist, you can simply create it. Once created, that account will be an administrator. > (4) I found our "universe_wsgi.ini" contains the following setting, should I > remove "#" before this line, stop and start galaxy? > #allow_user_creation = True No, True is the default setting so with it commented it is still set to True. > I plan to restart our galaxy, if it is possible, could you please let me > know your phone #? I can call you sometime today, or if you prefer, I can > give you my phone #. Unfortunately, we don't have the resources for phone support. We'll be happy to help via email as much as possible. --nate > > Looking forward to hearing from you. > > Best Wishes, > > Yan Luo, Ph.D. > NIH > > > On Thu, Feb 10, 2011 at 10:01 AM, Nate Coraor <nate(a)bx.psu.edu> wrote: > > > Hi Yan, > > > > I've moved this discussion to the galaxy-dev list since it pertains to a > > local installation of Galaxy. > > > > Responses to your questions follow, in-line. > > > > Yan Luo wrote: > > > Dear Sir, > > > > > > (1)We installed Galaxy, but recently the user can't registered and got > > the > > > following error, how can we fix it? > > > > > > Sever error > > > An error occurred. See the error logs for more information.(To turn debug > > on > > > to display ...). > > > > Since debug = False in universe_wsgi.ini, you should be able to find a > > more detailed error message in the log file. If starting Galaxy with: > > > > % sh run.sh --daemon > > > > The default log file is 'paster.log' in Galaxy's root directory. > > > > > (2) Could you please let me know if there is any command to stop galaxy? > > > > If starting with the --daemon flag (as above), you can use: > > > > % sh run.sh --stop-daemon > > > > If running in the foreground, you can use Ctrl-C to terminate the > > process. There is a recent bug whereby Ctrl-C is ineffective on some > > platforms under Python 2.6 - in this case you will have to kill/pkill > > the process manually. We are working on a fix for the latter. > > > > > (3) If I reset universe_wsgi.ini file and want to set an administrator > > > user(I can add a line in the above file), how can I get the password? > > Should > > > I stop galaxy(See question 2) first? then run "./setup.sh" and > > "./run.sh". > > > > setup.sh would have only been necessary prior to running Galaxy the > > first time, however, this step has recently been removed. If you are > > referencing documentation that still refers to setup.sh, please let us > > know so we can update it - I did notice this was still on the > > "Production Server" page, so I removed it from there. > > > > You no longer need to run setup.sh at all. > > > > > (4) If I run "setup.sh", will a new file "universe_wsgi.ini" be > > generated? > > > if I want to change this file,should I edit it before "run.sh" and after > > > "setup.sh". Is it right? > > > > setup.sh and its replacements in run.sh and the Galaxy application > > itself never overwrite files, they only create files from sample files > > if they do not exist. > > > > > (5) I read some of your docs, command "sh setup.sh"(sh run.sh) and > > > "./setup.sh"(./run.sh), which one is correct under Linux? > > > > Both syntaxes are effectively the same in most cases. > > > > --nate > > > > > > > > Looking forward to hearing from you. > > > > > > Best Wises, > > > > > > Yan Luo, Ph.D. > > > NIH > > > < > > http://int.ask.com/web?siteid=10000861&webqsrc=999&l=dis&q=By%20the%20way, > > > > > > > > _______________________________________________ > > > galaxy-user mailing list > > > galaxy-user(a)lists.bx.psu.edu > > > http://lists.bx.psu.edu/listinfo/galaxy-user > > > >

3 38

Error moving tool-data dir ... bowite indexes missing
by Bossers, Alex 26 Oct '11

26 Oct '11

On our up-to-date galaxy_central version I have successfully moved the database/files directory to mounted storage. After adapting the universe_wsgi.ini it works fine after galaxy restart. Now the tool-data dir becomes bigger and bigger and I wanted to move that to mounted storage as well. I moved the tool-data dir to the other location, made sure file permissions were ok and adapted the universe_wsgi.ini tool_data_path to match the new tool-data location. Galaxy restarts and the loc files are read (since blast recognises its databases). However, the bowtie indexes are not read since they do not show up in the NGS bowtie mapper for illumina..... what could be wrong? When I restore the universe_wsgi.ini it works all fine. I am puzzled. Alex

2 1

BWA wrapper error
by dhivya arasappan 26 Oct '11

26 Oct '11

Hi all, I'm seeing the following error when I try to use the galaxy BWA wrapper for SOLID data : An error occurred running this job:Could not determine BWA version The alignment failed. Error aligning sequence. /bin/sh: bwa: not found Any suggestions in figuring out what this error has to do with would be appreciated. thanks Dhivya

2 1

Re: [galaxy-dev] HOW TO RETRIEVE DATA FROM HISTORY??!!
by colin molter 20 Oct '11

20 Oct '11

> > > Is there a way to directly move/copy data from your galaxy history to a > given location in the filesystem of the same galaxy server? > Said differently, there is a nice way to import data from the server to > galaxy, is it possible to do the reverse? > > So far, I am obliged to download the file from galaxy to my client machine > and then back to the server!!!! with huge bam files of 3Gb it is not so > convenient!! > OK, I found a better way by (a) go to the admin panel, push the 'add a new dataset in the library' button and selecting the one needed from the current history (b) move the selected dataset from the library to a location mounted to galaxy. that is ok for me. However, if someone has a better solution, any advices are fine for me thnks colin

5 5

samtools mpileup update
by David Hoover 12 Oct '11

12 Oct '11

Is there any good reason why Galaxy can't use a more recent version of samtools? The only difference I can see is the mpileup command. The algorithm by which mpileup finds SNPs is a little different. Does anyone know if the file format has changed? The only files that need to change are sam_pileup.py and sam_pileup.xml. David Hoover Helix Systems Staff http://helix.nih.gov

4 3

disk space and file formats
by Patrick Page-McCaw 06 Oct '11

06 Oct '11

I'm not a bioinformaticist or programmer so apologies if this is a silly question. I've been occasionally running galaxy on my laptop and on the public server and I love it. The issue that I have is that my workflow requires many steps (what I do is probably very unusual). Each step creates a new large fastq file as the sequences are iteratively trimmed of junk. This fills my laptop and fills the public server with lots of unnecessary very large files. I've been thinking about the structure of the files and my workflow and it seems to me that a more space efficient system would be to have a single file (or a sql database) on which each tool can work. Most of what I do is remove adapter sequences, extract barcodes, trim by quality, map to the genome and then process my hits by type (exon, intron etc). Since the clean up tools in FASTX aren't written with my problem in mind, it takes several passes to get the sequences trimmed up before mapping. If I had a file that had a format something like (here as tab delimited): Header Seq Phred Start Len Barcode etc Each tool could read the Seq and Phred starting at Start and running Len nucleotides and work on that. The tool could then write a new Start and Len to reflect the trimming it has done[1]. For convenience let me call this an HSPh format. So it would be a real pain, no doubt, to rewrite all the tools. The little that I can read the tools it seems that the way the input is handled internally varies quite a bit. But it seems to me (naively?) that it would be relatively easy to write a conversion tool that would take the HSPh format and turn it into fastq or fast on the fly for the tools. Since most tools take fastq or fasta, it should be a write once, use many times, plugin. The harder (and slower) part would be mapping the fastq output back onto HSPh format. But again, this should be a write once, use for many tools plugin. Both of the intermediating files would be deleted when done. Just as a real quick test I thought I would see how long it takes to run sed on a fastq 1.35GB file and it was so fast on my laptop, < 2 minutes, that it was done before I noticed. Then as people are interested, the tools could be converted to take as input the new format. It may well be true in these days of $100 terabyte drives, this is not useful, that cycles are limiting, not drive space. But I think if the tools were rewritten to take and write to a HSPh format, processing would be faster too. It seems like some effort has been made to create the tab delimited format and maybe someone is already working on something like this (no doubt better designed). I may have a comp sci undergrad working in the lab this fall. With help we (well, he) might manage some parts of this. He is apparently quite a talented and hard working C++ programmer. Is it worth while? thanks [1] It could even do something like: Header Seq Phred Start Len Tool Parameter Start Len Tool Parameter Start Len etc Tool is the tool name, Parameter a list of parameters used, Start and Len would be the latest trim positions. And the last Start Len pair would be the one to use by default for the next tool, but this would keep an edit history without doubling the space needs with each processing cycle. I wouldn't need this but it might be more friendly for users, an "undo" means removing 4 columns. A format like this would probably be better as a sql database.

8 20

Re: [galaxy-dev] [galaxy-user] problem with displaying tracks from Galaxy
by Nate Coraor 03 Oct '11

03 Oct '11

Sergei Ryazansky wrote: > Hello all, > > we have the UCSC genome browser mirror as well as Galaxy mirror. The > Galaxy has a feature enabling a user to display the data at UCSC genome > browser as custom tracks. I have configured the galaxy to display the data > to our UCSC browser mirror but it doesn't work properly: after the > redirecting to genome browser page the "redirected to non-http(s): /root" > error message is appeared. At the same time displaying Galaxy data at > official UCSC works excellent. What are the possible reasons of it? > Thank you in advance! Hi Sergei, If your Galaxy server is behind a proxy server serving via https, have you set the following header: RequestHeader set X-URL-SCHEME https Please see the "SSL" section of the ApacheProxy page for more information: https://bitbucket.org/galaxy/galaxy-central/wiki/Config/ApacheProxy --nate > ___________________________________________________________ > The Galaxy User list should be used for the discussion of > Galaxy analysis and other features on the public server > at usegalaxy.org. Please keep all replies on the list by > using "reply all" in your mail client. For discussion of > local Galaxy instances and the Galaxy source code, please > use the Galaxy Development list: > > http://lists.bx.psu.edu/listinfo/galaxy-dev > > To manage your subscriptions to this and other Galaxy lists, > please use the interface at: > > http://lists.bx.psu.edu/

4 7