galaxy-dev June 2010

galaxy-dev@lists.galaxyproject.org

44 participants
77 discussions

stupid logging question
by Michael Pheasant 07 Sep '10

07 Sep '10

Using the run.sh script, and the default 'log_level = DEBUG' universe_wsgi.ini file setting gives a huge amount of output. Changing the ini to either log_level = INFO, WARNING or ERROR and i get the same output, including the INFO messages every second. I assumed that WARNING or ERROR would not give the INFO messages; is there some way to not see these? Cheers m -- Michael Pheasant Software Engineer Queensland Facility for Advanced Bioinformatics Level 6, QBP University of Queensland, QLD 4072 T: +61 (0)7 3346 2070 F: +61 (0)7 3346 2101 www.qfab.org

4 8

adding another UCSC mirror to display
by Davide Cittaro 04 Aug '10

04 Aug '10

Hi again, this may be a FAQ, sorry for that... How can I add an alternate UCSC mirror site (our local mirror) to galaxy so that everything that could be displayed at UCSC-main will have the UCSC-localmirror option? Besides... how can I remove the bx main mirror? I've tried to add a ${GALAXYROOT}/tool-data/shared/campus/campus_build_sites.txt but that's not enough... I've modified the ucsc/bam.xml file adding another dynamic_link section but that's not enough... any hint? Thanks again d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro(a)ifom-ieo-campus.it */

2 1

Dynamically adding tools to Galaxy
by Sumedha Ganjoo 26 Jul '10

26 Jul '10

Hello, I am a student at the University Of Georgia. We had developed a tool for Galaxy 1.3 that enables accessing various web services as tools from Galaxy. We are currently working on a similar tool for the current version of Galaxy. This requires adding a tool dynamically without having to restart Galaxy. I was wondering if someone has already implemented that feature? We do have most of the code for this in place. Is there a way, once complete, this code can be added to the main Galaxy source code? Also I would really appreciate if someone could tell me where can I find the older versions ( v 1.3 ) of Galaxy? Thanks in advance. Regards, Sumedha Sumedha Ganjoo Graduate Assistant Computer Science Department University Of Georgia

4 7

TopHat and other tools with too many options
by Assaf Gordon 12 Jul '10

12 Jul '10

Hi all, I'm in the process of adapting TopHat to our needs, and there are just to many options... It's OK if you run it from the command line, but in Galaxy it looks like a big mess. Similar to Bowtie's tool situation, the "common" option are not specific enough (to our needs) and the "full options" mode is too hard to use (which will result in users not using it at all, or using it wrong). I'd like to request/propose a change in the way the GUI is rendered based on the XML tool. Mainly, to create logical parameter "groups": parameters which logically go together, and are related to one another. In the XML file, it could look like: <inputs> <group name="Introns" help="These settings control the intron sensitivity."> <param name="min_intron_length" type="integer" value="70" /> <param name="max_intron_length" type="integer" value="500000" /> </group> <group name="Quality" help="XXXXXX"> <param name="max_multihits" type="integer" value="40" label="Maximum number of alignments to be allowed" /> <param name="junction_filter" type="float" value="0.15" label="Minimum isoform fraction: /> </group> ... </inputs> And in the HTML output, the groups will be visually distinct, with some nice hide/expand javascript trick, see (fake) example here: http://cancan.cshl.edu/labmembers/gordon/files/galaxy_advanced_options.html IMHO, there are couple of advantages in this layout: 1. The "big-picture" of the available settings is immediately visitble (e.g. "introns", "quality", "segments" etc.). 2. Parameters are separated into logical groups, easier to understand what's being changed (as opposed to one very long cryptic list of parameters). 3. Advanced vs. Simple options are clearly marked 4. Since parameter groups can be hidden, when they are expanded they can contain a help paragraph - this is much easier for the user than scrolling up/down to see the help section below (also - the relevant help section now appears right next to the parameters). I guess this is not a trivial change, but without it it will get harder and harder to integrate complex tools. Comments are welcomed, -gordon

2 1

Recently used tools options
by Assaf Gordon 03 Jul '10

03 Jul '10

Again, very nice feature. And of course, some comments: 1. The search is too general (IMHO). If I search for "rna", I get (among other tools) the "lastz" tool, because it has the following line in the help section: === 3. RNAME Reference sequence NAME === (Note the "RNA" in "RNAME"...) To be useful, the basic search has to include the tool name and possibly the description, and preferably some keywords (specified manually in the XML). An advanced search might offer a global text search. 2. The "recently used tools" is not updated after running a new tool. One has to reload the page (and then the list is hidden by default). 3. a small bug: 1. reload galaxy (for a clean start) 2. click "options->Search Tools" (search box is shown) 3. click "options->Show recently used" (recently-used menu is shown) 4. Search for a tool which is NOT in the recently used menu (e.g. search for "foobar") 5. The "recently used" menu is forever gone, no way to get it back: the "options" menu still says "hide", but even clicking multiple times on hide/show does not bring it back. 4. not necessarily a bug: All workflows in the "workflow" menu (i.e. workflows that were configured to be on the user's tools menu) are always displayed, regardless of the active search. This is probably by design, but it's conceptually confusing (because you encourage users to use workflows as regular tools in that menu). -gordon

2 1

Problem with history association after upgrade to 4cdf4cca0f31
by Dennis Gascoigne 01 Jul '10

01 Jul '10

I think there may be a bug with regards the history/job associations introduced in the upgrade. Many of the users have histories where the associated jobs are no longer shown. The histories themselves still exit. The problem seems to be mainly workflows with many jobs. The HistoryID/job association is still in the database, though there is no history/dataset association for these Histories (I havent investigated enough to know if this is expected). This is definitely a result of the upgrade and is reported by a number of users. It does not occur for all histories - only some. The latest upgrade does not help. Cheers Dennis

2 1

Issue with manage_db.py
by Dennis Gascoigne 01 Jul '10

01 Jul '10

It appears that the ini file is hardcoded into manage_db.py. You may want to make this an argument to the script as we have two different databases from two different configs and were wondering why the wrong one was updated. Cheers Dennis

2 1

User file access without uploading?
by Michael Siebauer 01 Jul '10

01 Jul '10

Hi, is there a way/module or data Library that allows users to access there files WITHOUT the need of uploading them first? Cause our Galaxy instance is running within the intranet, all files could be accessed straight via NFS. Thx, Michael ;-)

4 9

Workflow post-actions
by Assaf Gordon 30 Jun '10

30 Jun '10

AWESOME! Truly amazing. And yet, I have some comments: Notification emails =================== 1. It seems the notification email is sent immediately when the workflow is submitted, before the job is completed. I've only tested it a few times, but a for a workflow with 5 steps (each is a shell scripts that does "sleep 30") and the last step is configured with EmailAction - I get the email immediately, before the last step is completed. 2. users don't really need a 6-digits microsecond accuracy in the "completed time". 3. The subject line contains "%s", not replaced with a variable ( lib/galaxy/actions/post.py:77 ) 4. The email line says "your job "X"..." but "X" is the history name. This implicitly assumes that users run a single workflow in a history, which is not the case. A more common use case (in our local instance): users load 5 FASTQ files in a single history, and start 5 workflows on those files (all in the same history). A friendlier message would be "You workflow 'X' on dataset 'Y' in history 'Z' is completed", with X being the workflow name, and "Y" being the first dataset used as input in the workflow (if multiple files were used, take just one). Also, remember that many (most? in my case) users still call their histories "unnamed history". So the history name alone isn't helping. 5. The time reported in the emails (for me) is not the local server time. This might be an indication for a deeper problem (e.g. my postgres time zone is wrong). 6. Link to the galaxy server: it's a good idea to say "at Galaxy instance XXX", but if I may ask for more: instead of just the host name, use the complete "url_for()" link, so that mirrors that use the "prefix" option will contain the full link. If you add "http://" prefix, most email clients automatically convert this to a clickable link (even in textual, non-html emails), so users will have easier time getting to galaxy. Example: === Your job 'Unnamed history' at Galaxy instance rave.cshl.edu is complete as of 2010-06-29 23:18:53.271963. === Will be more helpful as: === Your job 'Unnamed history' at Galaxy instance http://rave.cshl.edu/galaxy is complete as of 2010-06-29 23:18:53.271963. === To ask for even more: Construct a link that automatically switches to the relevant history - so users will get to their completed jobs with a single click. Hidden Dataset ============== 1. There's no way to look at hidden datasets (you've mentioned that you're working on that). 2. Hidden datasets are still counted on the "history list" view - a bit confusing (but I'm not sure I have a good suggestion, because of the next item). 3. Two useability issues with hidden datasets: A long-running job whose dataset is hidden is confusing. Imagine a workflow with a paired-end mapping job that is hidden - it could take hours, but the history pane will show nothing: only green datasets (the previous workflow steps) and grey datasets (those that haven't run yet). A failed hidden job (didn't test it, just thought about it): If a hidden job fails, its following (unhidden) job will also fail, but it would not be immediately obvious how to get to the origin of the failure. Combining 2+3, It might be more useful if all datasets are displayed until the entire workflow is successfully completed, and only then datasets are hidden (some sort of an automatic post-action on the last workflow step). If any step in the workflow fails (or if the workflow isn't completed yet) - everything is visible (I must admit I don't have a good solution for a workflow with multiple final outputs). Dataset Rename ============== Would be great to be able to use templates/variables in the new name (e.g. "$input"), so that each step could contain parameters or the name of it's input. (Remember that my users run the same workflow multiple times in the same history, so a fixed name will create multiple datasets with the same name - a bit confusing). Metadata changes ================ Would be great to be able to change dbkey/organism as well (in addition/alternative to columns/filetype). Keep up the good work! Thanks, -gordon

3 3

History Import/Export
by Assaf Gordon 30 Jun '10

30 Jun '10

Kudos for this nice feature. It would be very useful, both for backups and for exchanging histories between Galaxies. But as usual ;) I have couple of comments: 1. Exporting big files doesn't work. You first create the tar file, then add the individual datasets, and only then stream the tar file to the user. very bad. Try exporting a history with 18M-paired-end-76nt-reads FASTQ files (and remember that they are duplicated because you guys still insist on grooming). The proxy (apache at least) will surely timeout. BTW, since you're using native python Tarfile, the python process will keep being on 100% CPU long after the client connection timed out. 20 users exporting large histories at the same time will constitute a DDOS attack ;) 2. Importing large files (after you'll fix exporting large files...) will cause the same problems as uploading huge FASTQ files (and you're seeing more and more complaints about those). Combining 1+2, I would suggest the following: 1. Make the export a new "job". This will allow it to run for a long time, in a separate process, not blocking python and not tying the user to the browser (also remember: downloading 3GB with your browser is no fun). Since your code already store "temporary" tarballs in a dedicated "export" directory, just keep them there for an arbitrary amount of time (let say: 4 days), and provide the user will a link to download the history at his/her own convenience. 2. Allow importing a history from a URL. This will also have the benefit of not transferring huge tarballs through the users' personal computer when they want to transfer histories between Galaxies. They will go to one Galaxy, run the "export" job, get a download link, wait until the export job is done, then go to the other Galaxy, and simply import the history with that link. And a bonus: You've checked the tarball for malicious files, but not the JSON attributes. If I import a history with a dataset that have: [ { "file_name": "../../../etc/passwd" } ] I can get any file on the system into Galaxy (this happens in <galaxy>/lib/galaxy/web/controllers/history.py:580). If I import "/dev/urandom" it'll be even more fun. I would recommend against checking for ".." or "/" (as in line 457), because a determined person could possibly circumvent that with tricks like ".\u002E/etc/passwd" etc.). Instead, construct the full path (as set in the dataset attributes), then check with os.path.abspath() that it resides in the temporary import directory. Comments are welcomed, -gordon

4 5

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

galaxy-dev June 2010