July 2009 - galaxy-dev - lists.galaxyproject.org

Feedback
by Pieter Neerincx 25 Sep '09

25 Sep '09

Hi Galaxy team, The following is just a bunch of thoughts after using Galaxy for a while and which might be interesting for future developments... 1. Interface consistency: "Save" * There are three nice icons at the top of all my dataset items in the history panel on the right for view, edit and delete. So why is there no save icon at the same location instead of a link further down? * When I edit a workflow there is a save button above the canvas and there is another on in the panel on the right when I edit the properties of a specific workflow item. As far as I can tell these buttons are not completely redundant, but why do I need two save buttons? 2. Provenance data * Reproducibility is important and it is nice that Galaxy automatically captures your analysis in histories, but if I want to have a second look at my data after let's say a few months to figure out what I did exactly and how a certain combination of data and tools produced a certain result. Hence if I for example executed a workflow once every two weeks on updated data for many months I might want to retrieve the history for a certain version of a database. So I might want to say give me the histories containing datasets tagged as Ensembl version 48, or UniProt 3 or some version of a reference assembly, etc. Or I might want to see how the results changed for a certain gene over time as result of updated databases and /or tools. So I might want to say to Galaxy show me all histories containing ENSGALG000012589 or NM_45689725. Hence, I'd love to be able to search histories. In addition to make it a bit easier to trace thing in browse mode it would be nice if the date a history was last modified would be visible. Currently I only have the age of the history in minutes, hours or days. That is convenient for recent items, but for things that are longer ago a date makes more sense to me.... * There is a fixed "Database/Build" popup that I can use to tag my data sets, but this feels artificially limited. Is there any reason why the species and database version cannot be separate items? If there would be a popup first to select a species followed by a second popup to select the genome assembly version, the lists could be a lot smaller and hence easier to navigate. In addition there are cases where I do have a species, but don't have an assembly or where there are additional version numbers to keep track of. For example I have lots of Ensembl data. Ensembl does not have a single version number, but 3 version numbers. There is one for the database schema, one for the assembly and one for the annotation/ genebuild. The curent version for mouse is for example: 55 37 h, where 55 is the release and schema version number, 37 the assembly and "h" the version of the gene build. In addition I recently moved to a proteomics group and might want to capture DB version numbers for species without a reference assembly. for example I might know the species name and the fact I'm using UniProt 15.5... but currently I cannot easily capture that in a consistent way. (I know I might add this to the "info" for a dataset, but it's free text, with all kinds of possible spelling variants as a result...) Cheers, Pi ------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 (0)6-143 66 783 email: pieter.neerincx(a)gmail.com skype: pieter.online ------------------------------------------------------------

4 4

Sharing histories in galaxy [long]
by Assaf Gordon 13 Aug '09

13 Aug '09

Hello, I've recently upgrade to the latest version, and after Greg and Ross explained the security model (in small words, so I'll understand), I played a bit with the new model. I have some usability issue with the new way sharing histories work. It might be because we are relatively a small and tight user base, and the model was designed to work with disperse teams from around them world, where security must be enforced by an appointed security administrator. We don't have security administrator. We need galaxy to be secured, but without the hassle of administration - and everything should 'just work'. The following describe four issues (the description is exaggerated on purpose, please take no offense) Regards, gordon. 1) Steps needed to share histories ---------------------------------- Usage scenario: User A wants to share datasets with User B. 'Old way' (with public datasets, no security): 1. Login as User A. 2. Select History. 3. Click "options" 4. Click "share this history" 5. Enter email of user B 6. Click "Share". Outcome: user B has the new history in his histories list, and can view all files. 'New way' (still with public datasets, no security): 1. Login as User A. 2. Select History. 3. Click "Options" 4. click "share" 5. Enter email of user B. 6. click "share" Outcome: user B doesn't have the history in his histories list. What User B needs to do: 1. click "options" 2. click the *other* list link (there are two of them, both named 'link', and one has to actually read the rest of the sentence to know what's the different between them). 3. Find the needed shared history, no dates or options to sort by. 4. click on the tiny triangle button, which will show a pop-up menu with only one button 'clone' (which he's going to click anyway, so why not make it available as a huge button on the main page?). 5. click 'clone'. 6. User sees a technical question about cloning deleted items. there's no default selection, so you can't just click "clone". you have to read and understand what's going on. Don't underestimate the annoyance of this: You advertise galaxy as easy to use for biologists, who don't need to understand the technical aspects - so why make them read technical details and choose these options? 7. Only then, the shared+cloned history appears in the history list. With security model in place (meaning datasets are not public), things get more confusing. User A wants to share history with user B. User A's implied wish: "I want User B to view my files". When User A clicks "share", enters the email and click "share", he is then presented with four permission related options. The last one is "Don't Share". This is kind of funny. it reminds me of clicking the "start" button to shutdown the computer in you-know-which operating system. The scenario of selecting "Don't Share" and clicking "Go" button - this is a no-op. why is it needed ? If this wasn't a web-application - then yes, this modal dialog would require a 'cancel' button. But this is a web-application. just click on another link instead of clicking "go". A "go" button implies something will be done. The second-to-last is "Share Anyway (don't change any permission)". This is a problem waiting to happen - you are explicitly allowing the user to do something that you know will not work. >From a technical perspective, that's OK - a knowledgeable user can later on manually set permissions, or can later on whine to the administrator about permission problems (or my favorite: User A tells User B to ask the admin to give him permissions). IMHO, a program should not allow a user to do anything that will definitely not work. The first option is "make datasets public". This is bad option for two reasons: 1. User A don't want to make his datasets public (otherwise he would not set permissions on them). All he wants is to allow User B to view his files. 2. If permissions problem persist, Users will get accustomed to just 'make everything public' because that's the easy solution. and this will make the whole security thing redundant. Also remember that PUBLIC means anybody can view them - anybody, not just people you shared it with. This is not obvious. this means "no security". The second option is "make datasets private to me and the user...". This seems like what the user actually wants, but it has strange side effects (see item 3, below). 2) Deleting shared history -------------------------- Scenario: User A wants to delete the current history. the history happens to be shared. 1. Click "options" 2. Click "delete current history" 3. Message box appears: "Are you sure?". click yes. 4. Red warning appears: "History has been shared, unshare it before deleting it". 1. There's no link to 'unshare' it, how do I do that? There's no "unshare" button anywhere. You have to go the the list of histories, find the history in the list, click on the "shared" link, then "unshare" it. quite unintuitive... 2. If I shared the history with twenty users, I need to click the little triangle button and then the "unshare" link for each of them. quite annoying. Why not have an "unshare all" button ? 3. There's no confirmation needed to unshare a history, no user intervention. So theoretically, there's no reason that a "delete" operation would not automatically unshare all shares, then delete the history. 5. This is the real confusing part: To me, "share" means: "allow other users to view the files in this history", and "unshare" means: "don't allow other users to view the files in the history". But in the new galaxy model, "share" means: allow other users to clone the history and then view my files", and "unshare" means: "don't allow other users to clone (or clone again) the history". The old galaxy had no concept of "unshare" - and that's logical - once you shared your files - you can't take them away. In the new Galaxy, "unshare" is not really unsharing: if another user has cloned the shared history, he still has access to your files (de-facto: they are still shared). the other user simply can't clone your history again. -------- 3) Side Effect of Ad-Hoc sharing roles --------------------------------------- Here are the exact steps I made, and two problems that happen with them. ## ## Creating base configuration ## $ hg clone http://www.bx.psu.edu/hg/galaxy galaxy1 $ cd galaxy1 # (my default is python 2.6, too bad for me) $ sed -i 's/^python /python2.5 /' run.sh $ sed -i 's/^python /python2.5 /' setup.sh $ sh setup.sh $ sh run.sh --- In galaxy # # First user # User -> Register username: gordon1(a)cshl.edu User -> Preferences -> Change Default Permissions for new histories Roles Associated: [Added 'gordon1(a)cshl.edu'] (click 'save') User -> Logout # # Second User # User -> Register username: gordon2(a)cshl.edu User -> Preferences -> Change Default Permissions for new histories Roles Associated: [Added 'gordon2(a)cshl.edu'] (click 'save') User -> Logout # # Third User # User -> Register username: gordon3(a)cshl.edu User -> Preferences -> Change Default Permissions for new histories Roles Associated: [Added 'gordon3(a)cshl.edu'] (click 'save') User -> Logout ------------------- This is the base configuration. The following test cases start from this configuration. ------------------- User->Login username: gordon1(a)cshl.edu # History 1 Options -> delete current history (because of subtle permission issue, see item 4, below) Rename History: "His 1 of Gordon1" Get Data->Upload File: (pasted text: "Data 1 of His 1 of Gordon 1") Rename Dataset to "Data 1 of His 1 of Gordon 1" Options -> Share Current History Username: 'gordon2(a)cshl.edu' (click 'submit') Permissions dialog: How would you like to proceed? - "Make datasets private to me and the users ..." (option 2) (click 'go') # Note: the second file is uploaded AFTER the share. Get Data->Upload File: (pasted text: "Data 2 of His 1 of Gordon 1") Rename Dataset to "Data 2 of His 1 of Gordon 1" User -> Logout # # New as the second user # User->Login username: gordon2(a)cshl.edu Options->List Shared histories (the 'other' list link) (seeing two histories shared from gordon1) On "His1", click the little triangle button, then 'clone'. Select "clone all history items" (option 1), click "clone" Options -> List (two histories, 1 unnamed, and one cloned). Switch to the cloned history. # # Problem 1: # In the second user's history pane: The first dataset is OK, the second one is blocked. Technically this is correct, because the second dataset was created after the share, and the new ad-hoc role was not applied to it. but from a usability POV, this is confusing - the history was cloned by the second user AFTER the first user created the files. In real life: 1. User A shares a History with User B. 2. User A adds more files to the history 3. User A tells user B: "I've added more files, clone the history again" 4. User B clones the history (with the new items), but the new items are blocked. In the 'previous' version of galaxy (before the changeset that introduced the share mechanism), the "share" action literally meant "Copy the current content of the this history to other users" - and you could do it multiple times, with different contents of the same history. This is not the case anymore. Also, going back to the first user (gordon1) and re-sharing the history doesn't work (a red warning appears that this history is already shared). A non-technical user (who doesn't care about roles/groups/permissions and just wants things to work) will be very frustrated. The technical solution of visiting every dataset and adding the role is maybe good for one or two datasets, but not to twelve datasets that were created by a workflow. # # Problem 2 : # (this might be a bug) # continuing from the previously described state User -> Logout User -> Login, username: gordon1(a)cshl.edu (first user) Options -> List Histories Switch to "His 1 of Gordon1" This history contains two datasets: Dataset 1 ( "Data 1 of His 1 Of Gordon 1" ) has: [access] roles associated: "Sharing role for: gordon1(a)cshl.edu, gordon2(a)cshl.edu" roles not associated: "gordon1(a)cshl.edu" Dataset 2 ( "Data 2 of his 1 of gordon 1") has: [access] roles associated: "gordon1(a)cshl.edu" roles not associated: "sharing role for: gordon1(a)cshl.edu, gordon2(a)cshl.edu". Options -> Share username: gordon3(a)cshl.edu (third user) (click 'submit') Permissions dialog: How would you like to proceed? - "Make datasets private to me and the users ..." (option 2) (click 'go') The roles of all the datasets is reset to 'sharing role for: gordon1(a)cshl.edu, gordon3(a)cshl.edu" Meaning that the second user (gordon2), which was previously allowed to view at least one dataset, is now deprived of even this privilege. Even if this is technically correct, from a usability POV it is very confusing. What the first user did: 1. share history with second user (implied: I want the second user to view my files). 2. share history with third user (implied: I want the third user to also view my files). But the outcome is that the second user is now blocked. 4) Subtle security bug/feature: ------------------------------- The first time a NEW user logs on, an empty history is created. Going to USER->Preferences->Change default permissions Does not change the permissions of current history, and there's no button to create a new history (because this is an empty history). So the files in the current history will be public. While technically correct, this behavior is confusing. What the user wants is: "everything I create from now on has X permissions", But what technically is done is: "New histories will have X permissions, but you are currently in an old history which uses the old settings". --------------------------- Thanks for reading so far.

2 3

Bar charts
by Pieter Neerincx 05 Aug '09

05 Aug '09

Hi, The bar chart generator does not work for me. It looks like gnuplot is complaining (see below or attached screenshot), but I'm not sure if this is the problem. I'm on CentOS 5.3 and have gnuplot version 4.0.0-14.el5. In addition I installed gnuplot-py 1.8. How do I fix this? Cheers, Pi gnuplot> set style histogram clustered gap 5 title offset character 0, 0, 0 ^ line 0: expecting 'data', 'function', 'line', 'fill' or 'arrow' line 0: undefined variable: in line 0: undefined variable: invert gnuplot> set style data histograms ^ line 0: expecting 'lines', 'points', 'linespoints', 'dots', 'impulses', 'yerrorbars', 'xerrorbars', 'xyerrorbars', 'steps', 'fsteps', 'histeps', 'filledcurves', 'boxes', 'boxerrorbars', 'boxxyerrorbars', 'vectors', 'financebars', 'candlesticks', 'errorlines', 'xerrorlines', 'yerrorlines', 'xyerrorlines', 'pm3d' gnuplot: unable to open display 'localhost:13.0' gnuplot: X11 aborted. ------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 (0)6-143 66 783 email: pieter.neerincx(a)gmail.com skype: pieter.online ------------------------------------------------------------

3 3

Feature suggestion: administrative new-job lock
by Assaf Gordon 31 Jul '09

31 Jul '09

Hello, I'd like to suggest a new feature - administrative lock which prevents new jobs from starting. Use case: 1. I need to stop my galaxy server (to upgrade, to add new tool, to add new dbkey, etc). 2. I want to stop it in a friendly way, without killing running jobs, but not allow new jobs to start. 3. Once all the running jobs complete (and new jobs are not started), I can stop the server, fix it, and restart. 4. Restarting the server (and assuming jobs are tracked in the database), all the held jobs are started normally. Code changes: 1. new administrative method ( job_lock() in ./lib/galaxy/web/controller/admin.py ) 2. new template ( ./templates/admin/job_lock.mako ) 3. job lock code in lib/galaxy/jobs/schedulingpolicy/roundrobin.py I'm still testing this patch, but comments are welcomed. -gordon.

3 2

Re: [galaxy-dev] Fresh install has wrong layout (From galaxy-user)
by Nate Coraor 31 Jul '09

31 Jul '09

Hi Chris, Sorry about missing the reply, I was on vacation when you wrote back. Also, I've moved this over to the dev mailing list since it's regarding local installation issues. Could you do the following: Paste the <iframe> tags from "View Source" on the main page. Paste the log output from the server (or paster.log if you're running as a daemon) of your client connecting to Galaxy. Thanks, --nate Chris Cole wrote: > Just realised I'd not sent this to the list. Whoops! > > This is still broken, anyone have any ideas? > Cheers, > > Chris > > -------- Original Message -------- > Subject: Re: [galaxy-user] Fresh install has wrong layout > Date: Tue, 21 Jul 2009 10:47:50 +0100 > From: Chris Cole <chris(a)compbio.dundee.ac.uk> > Organisation: University of Dundee > To: Nate Coraor <nate(a)bx.psu.edu> > References: <4A4E147C.6010107(a)compbio.dundee.ac.uk> > <4A4E1506.5030609(a)bx.psu.edu> <4A4E1716.2070800(a)compbio.dundee.ac.uk> > <4A52AFFA.5020102(a)bx.psu.edu> > > Hi Nate, > > Sorry for not replying earlier, I've been on holiday. Attached should be > my universe_wsgi.ini for the updated server. (The file has a .txt > extension to avoid sending blockage). > > I've used the same settings (where applicable) for my currently working > svn version 1686. > > Chris > > Nate Coraor wrote: >> Chris, >> >> Could you send us your universe_wsgi.ini? We haven't seen this happen >> without a proxy before. >> >> --nate >> >> Chris Cole wrote: >>> Hi Nate, >>> >>> Nope. There's no proxy configured. The only things I changed in the HTTP >>> config part of the universe_wsgi.ini are the host and the port. >>> Cheers, >>> >>> Chris >>> >>> Nate Coraor wrote: >>>> Hi Chris, >>>> >>>> It sounds like you maybe be running through a proxy, away from the >>>> server root, without the proxy-prefix option set in universe_wsgi.ini? >>>> If so, have a look at the documentation here: >>>> >>>> http://g2.trac.bx.psu.edu/wiki/HowToInstall/ApacheProxy#Apacheconfiguration… >>>> >>>> --nate >>>> >>>> Chris Cole wrote: >>>>> Hi, >>>>> >>>>> I've been using the subversion version of Galaxy for a while, but now I >>>>> want to update to the latest mercurial release. >>>>> >>>>> I installed it in a new location as per the wiki and set-up the SGE >>>>> settings as per my previous install and all the tests pass. It launches >>>>> fine, but when viewing the website the frames seem to be off-by-one. >>>>> i.e., the 'Tools' frame is empty except for "This link may not be >>>>> followed from within Galaxy.", the tools are in the middle window and >>>>> the 'History' frame has general information. >>>>> >>>>> I can't see any errors in the paster.log file, so where can I look to >>>>> sort out this error? >>>>> Thanks, >>>>> >>>>> Chris >>>>> _______________________________________________ >>>>> galaxy-user mailing list >>>>> galaxy-user(a)bx.psu.edu >>>>> http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user > > > > ------------------------------------------------------------------------ > > _______________________________________________ > galaxy-user mailing list > galaxy-user(a)bx.psu.edu > http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user

2 1

[hg] galaxy 2515: Add new flag, allow_datatype_change, to Dataty...
by Greg Von Kuster 31 Jul '09

31 Jul '09

details: http://www.bx.psu.edu/hg/galaxy/rev/aabcc797c1da changeset: 2515:aabcc797c1da user: Dan Blankenberg <dan(a)bx.psu.edu> date: Thu Jul 30 16:23:47 2009 -0400 description: Add new flag, allow_datatype_change, to Datatypes. When set to False, datasets of this datatype cannot be changed from or into. This is needed, i.e. for Rgenetics Datatypes to prevent loss of metadata ('base_name') which occurs when changing to datatypes without this metadata parameter and would render the dataset unusable (composite filenames could be incorrect). 8 file(s) affected in this change: lib/galaxy/datatypes/data.py lib/galaxy/datatypes/genetics.py lib/galaxy/web/controllers/admin.py lib/galaxy/web/controllers/library.py lib/galaxy/web/controllers/root.py templates/admin/library/ldda_edit_info.mako templates/dataset/edit_attributes.mako templates/library/ldda_edit_info.mako diffs (287 lines): diff -r 904a72f5cf4c -r aabcc797c1da lib/galaxy/datatypes/data.py --- a/lib/galaxy/datatypes/data.py Thu Jul 30 15:03:53 2009 -0400 +++ b/lib/galaxy/datatypes/data.py Thu Jul 30 16:23:47 2009 -0400 @@ -51,6 +51,8 @@ copy_safe_peek = True is_binary = True #The dataset contains binary data --> do not space_to_tab or convert newlines, etc. Allow binary file uploads of this type when True. + + allow_datatype_change = True #Allow user to change between this datatype and others. If False, this datatype cannot be changed from or into. #Composite datatypes composite_type = None diff -r 904a72f5cf4c -r aabcc797c1da lib/galaxy/datatypes/genetics.py --- a/lib/galaxy/datatypes/genetics.py Thu Jul 30 15:03:53 2009 -0400 +++ b/lib/galaxy/datatypes/genetics.py Thu Jul 30 16:23:47 2009 -0400 @@ -122,6 +122,7 @@ file_ext="html" composite_type = 'auto_primary_file' + allow_datatype_change = False def missing_meta( self, dataset ): """Checks for empty meta values""" @@ -255,6 +256,8 @@ file_ext = None is_binary = True + + allow_datatype_change = False composite_type = 'basic' diff -r 904a72f5cf4c -r aabcc797c1da lib/galaxy/web/controllers/admin.py --- a/lib/galaxy/web/controllers/admin.py Thu Jul 30 15:03:53 2009 -0400 +++ b/lib/galaxy/web/controllers/admin.py Thu Jul 30 16:23:47 2009 -0400 @@ -1094,7 +1094,7 @@ replace_dataset = None # Let's not overwrite the imported datatypes module with the variable datatypes? # The built-in 'id' is overwritten in lots of places as well - ldatatypes = [ x for x in trans.app.datatypes_registry.datatypes_by_extension.iterkeys() ] + ldatatypes = [ dtype_name for dtype_name, dtype_value in trans.app.datatypes_registry.datatypes_by_extension.iteritems() if dtype_value.allow_datatype_change ] ldatatypes.sort() if params.get( 'new_dataset_button', False ): upload_option = params.get( 'upload_option', 'upload_file' ) @@ -1247,17 +1247,20 @@ elif action == 'edit_info': if params.get( 'change', False ): # The user clicked the Save button on the 'Change data type' form - trans.app.datatypes_registry.change_datatype( ldda, params.datatype ) - trans.app.model.flush() - msg = "Data type changed for library dataset '%s'" % ldda.name - return trans.fill_template( "/admin/library/ldda_edit_info.mako", - ldda=ldda, - library_id=library_id, - datatypes=ldatatypes, - restrict=params.get( 'restrict', True ), - render_templates=params.get( 'render_templates', False ), - msg=msg, - messagetype=messagetype ) + if ldda.datatype.allow_datatype_change and trans.app.datatypes_registry.get_datatype_by_extension( params.datatype ).allow_datatype_change: + trans.app.datatypes_registry.change_datatype( ldda, params.datatype ) + trans.app.model.flush() + msg = "Data type changed for library dataset '%s'" % ldda.name + return trans.fill_template( "/admin/library/ldda_edit_info.mako", + ldda=ldda, + library_id=library_id, + datatypes=ldatatypes, + restrict=params.get( 'restrict', True ), + render_templates=params.get( 'render_templates', False ), + msg=msg, + messagetype=messagetype ) + else: + return trans.show_error_message( "You are unable to change datatypes in this manner. Changing %s to %s is not allowed." % ( ldda.extension, params.datatype ) ) elif params.get( 'save', False ): # The user clicked the Save button on the 'Edit Attributes' form old_name = ldda.name diff -r 904a72f5cf4c -r aabcc797c1da lib/galaxy/web/controllers/library.py --- a/lib/galaxy/web/controllers/library.py Thu Jul 30 15:03:53 2009 -0400 +++ b/lib/galaxy/web/controllers/library.py Thu Jul 30 16:23:47 2009 -0400 @@ -430,7 +430,7 @@ replace_dataset = None # Let's not overwrite the imported datatypes module with the variable datatypes? # The built-in 'id' is overwritten in lots of places as well - ldatatypes = [ x for x in trans.app.datatypes_registry.datatypes_by_extension.iterkeys() ] + ldatatypes = [ dtype_name for dtype_name, dtype_value in trans.app.datatypes_registry.datatypes_by_extension.iteritems() if dtype_value.allow_datatype_change ] ldatatypes.sort() if id: if params.get( 'permissions', False ): @@ -505,10 +505,14 @@ if trans.app.security_agent.allow_action( trans.user, trans.app.security_agent.permitted_actions.LIBRARY_MODIFY, library_item=ldda ): - trans.app.datatypes_registry.change_datatype( ldda, params.datatype ) - trans.app.model.flush() - msg = "Data type changed for library dataset '%s'" % ldda.name - messagetype = 'done' + if ldda.datatype.allow_datatype_change and trans.app.datatypes_registry.get_datatype_by_extension( params.datatype ).allow_datatype_change: + trans.app.datatypes_registry.change_datatype( ldda, params.datatype ) + trans.app.model.flush() + msg = "Data type changed for library dataset '%s'" % ldda.name + messagetype = 'done' + else: + msg = "You are unable to change datatypes in this manner. Changing %s to %s is not allowed." % ( ldda.extension, params.datatype ) + messagetype = 'error' else: msg = "You are not authorized to change the data type of dataset '%s'" % ldda.name messagetype = 'error' diff -r 904a72f5cf4c -r aabcc797c1da lib/galaxy/web/controllers/root.py --- a/lib/galaxy/web/controllers/root.py Thu Jul 30 15:03:53 2009 -0400 +++ b/lib/galaxy/web/controllers/root.py Thu Jul 30 16:23:47 2009 -0400 @@ -247,8 +247,11 @@ params = util.Params( kwd, safe=False ) if params.change: # The user clicked the Save button on the 'Change data type' form - trans.app.datatypes_registry.change_datatype( data, params.datatype ) - trans.app.model.flush() + if data.datatype.allow_datatype_change and trans.app.datatypes_registry.get_datatype_by_extension( params.datatype ).allow_datatype_change: + trans.app.datatypes_registry.change_datatype( data, params.datatype ) + trans.app.model.flush() + else: + return trans.show_error_message( "You are unable to change datatypes in this manner. Changing %s to %s is not allowed." % ( data.extension, params.datatype ) ) elif params.save: # The user clicked the Save button on the 'Edit Attributes' form data.name = params.name @@ -314,7 +317,7 @@ data.metadata.dbkey = data.dbkey # let's not overwrite the imported datatypes module with the variable datatypes? # the built-in 'id' is overwritten in lots of places as well - ldatatypes = [x for x in trans.app.datatypes_registry.datatypes_by_extension.iterkeys()] + ldatatypes = [ dtype_name for dtype_name, dtype_value in trans.app.datatypes_registry.datatypes_by_extension.iteritems() if dtype_value.allow_datatype_change ] ldatatypes.sort() trans.log_event( "Opened edit view on dataset %s" % str(id) ) return trans.fill_template( "/dataset/edit_attributes.mako", data=data, datatypes=ldatatypes ) diff -r 904a72f5cf4c -r aabcc797c1da templates/admin/library/ldda_edit_info.mako --- a/templates/admin/library/ldda_edit_info.mako Thu Jul 30 15:03:53 2009 -0400 +++ b/templates/admin/library/ldda_edit_info.mako Thu Jul 30 16:23:47 2009 -0400 @@ -99,24 +99,30 @@ <div class="toolForm"> <div class="toolFormTitle">Change data type of ${ldda.name}</div> <div class="toolFormBody"> - <form name="change_datatype" action="${h.url_for( controller='admin', action='library_dataset_dataset_association', library_id=library_id, folder_id=ldda.library_dataset.folder.id, edit_info=True )}" method="post"> - <input type="hidden" name="id" value="${ldda.id}"/> + %if ldda.datatype.allow_datatype_change: + <form name="change_datatype" action="${h.url_for( controller='admin', action='library_dataset_dataset_association', library_id=library_id, folder_id=ldda.library_dataset.folder.id, edit_info=True )}" method="post"> + <input type="hidden" name="id" value="${ldda.id}"/> + <div class="form-row"> + <label>New Type:</label> + <div style="float: left; width: 250px; margin-right: 10px;"> + ${datatype( ldda, datatypes )} + </div> + <div class="toolParamHelp" style="clear: both;"> + This will change the datatype of the existing dataset + but <i>not</i> modify its contents. Use this if Galaxy + has incorrectly guessed the type of your dataset. + </div> + <div style="clear: both"></div> + </div> + <div class="form-row"> + <input type="submit" name="change" value="Save"/> + </div> + </form> + %else: <div class="form-row"> - <label>New Type:</label> - <div style="float: left; width: 250px; margin-right: 10px;"> - ${datatype( ldda, datatypes )} - </div> - <div class="toolParamHelp" style="clear: both;"> - This will change the datatype of the existing dataset - but <i>not</i> modify its contents. Use this if Galaxy - has incorrectly guessed the type of your dataset. - </div> - <div style="clear: both"></div> + <div class="warningmessagesmall">${_('Changing the datatype of this dataset is not allowed.')}</div> </div> - <div class="form-row"> - <input type="submit" name="change" value="Save"/> - </div> - </form> + %endif </div> </div> diff -r 904a72f5cf4c -r aabcc797c1da templates/dataset/edit_attributes.mako --- a/templates/dataset/edit_attributes.mako Thu Jul 30 15:03:53 2009 -0400 +++ b/templates/dataset/edit_attributes.mako Thu Jul 30 16:23:47 2009 -0400 @@ -102,27 +102,34 @@ </div> <p /> %endif + <div class="toolForm"> <div class="toolFormTitle">${_('Change data type')}</div> <div class="toolFormBody"> - <form name="change_datatype" action="${h.url_for( controller='root', action='edit' )}" method="post"> - <input type="hidden" name="id" value="${data.id}"/> + %if data.datatype.allow_datatype_change: + <form name="change_datatype" action="${h.url_for( controller='root', action='edit' )}" method="post"> + <input type="hidden" name="id" value="${data.id}"/> + <div class="form-row"> + <label> + ${_('New Type')}: + </label> + <div style="float: left; width: 250px; margin-right: 10px;"> + ${datatype( data, datatypes )} + </div> + <div class="toolParamHelp" style="clear: both;"> + ${_('This will change the datatype of the existing dataset but <i>not</i> modify its contents. Use this if Galaxy has incorrectly guessed the type of your dataset.')} + </div> + <div style="clear: both"></div> + </div> + <div class="form-row"> + <input type="submit" name="change" value="${_('Save')}"/> + </div> + </form> + %else: <div class="form-row"> - <label> - ${_('New Type')}: - </label> - <div style="float: left; width: 250px; margin-right: 10px;"> - ${datatype( data, datatypes )} - </div> - <div class="toolParamHelp" style="clear: both;"> - ${_('This will change the datatype of the existing dataset but <i>not</i> modify its contents. Use this if Galaxy has incorrectly guessed the type of your dataset.')} - </div> - <div style="clear: both"></div> + <div class="warningmessagesmall">${_('Changing the datatype of this dataset is not allowed.')}</div> </div> - <div class="form-row"> - <input type="submit" name="change" value="${_('Save')}"/> - </div> - </form> + %endif </div> </div> <p /> diff -r 904a72f5cf4c -r aabcc797c1da templates/library/ldda_edit_info.mako --- a/templates/library/ldda_edit_info.mako Thu Jul 30 15:03:53 2009 -0400 +++ b/templates/library/ldda_edit_info.mako Thu Jul 30 16:23:47 2009 -0400 @@ -99,24 +99,30 @@ <div class="toolForm"> <div class="toolFormTitle">Change data type</div> <div class="toolFormBody"> - <form name="change_datatype" action="${h.url_for( controller='library', action='library_dataset_dataset_association', library_id=library_id, folder_id=ldda.library_dataset.folder.id, edit_info=True )}" method="post"> - <input type="hidden" name="id" value="${ldda.id}"/> + %if ldda.datatype.allow_datatype_change: + <form name="change_datatype" action="${h.url_for( controller='library', action='library_dataset_dataset_association', library_id=library_id, folder_id=ldda.library_dataset.folder.id, edit_info=True )}" method="post"> + <input type="hidden" name="id" value="${ldda.id}"/> + <div class="form-row"> + <label>New Type:</label> + <div style="float: left; width: 250px; margin-right: 10px;"> + ${datatype( ldda, datatypes )} + </div> + <div class="toolParamHelp" style="clear: both;"> + This will change the datatype of the existing dataset + but <i>not</i> modify its contents. Use this if Galaxy + has incorrectly guessed the type of your dataset. + </div> + <div style="clear: both"></div> + </div> + <div class="form-row"> + <input type="submit" name="change" value="Save"/> + </div> + </form> + %else: <div class="form-row"> - <label>New Type:</label> - <div style="float: left; width: 250px; margin-right: 10px;"> - ${datatype( ldda, datatypes )} - </div> - <div class="toolParamHelp" style="clear: both;"> - This will change the datatype of the existing dataset - but <i>not</i> modify its contents. Use this if Galaxy - has incorrectly guessed the type of your dataset. - </div> - <div style="clear: both"></div> + <div class="warningmessagesmall">${_('Changing the datatype of this dataset is not allowed.')}</div> </div> - <div class="form-row"> - <input type="submit" name="change" value="Save"/> - </div> - </form> + %endif </div> </div> <p/>

1 0

[hg] galaxy 2514: Fix for trans.log_event when undeleting a hist...
by Dan Blankenberg 30 Jul '09

30 Jul '09

details: http://www.bx.psu.edu/hg/galaxy/rev/904a72f5cf4c changeset: 2514:904a72f5cf4c user: Dan Blankenberg <dan(a)bx.psu.edu> date: Thu Jul 30 15:03:53 2009 -0400 description: Fix for trans.log_event when undeleting a history. 1 file(s) affected in this change: lib/galaxy/web/controllers/history.py diffs (12 lines): diff -r 3574137cf7fb -r 904a72f5cf4c lib/galaxy/web/controllers/history.py --- a/lib/galaxy/web/controllers/history.py Thu Jul 30 14:13:04 2009 -0400 +++ b/lib/galaxy/web/controllers/history.py Thu Jul 30 15:03:53 2009 -0400 @@ -166,7 +166,7 @@ default_permissions[ default_action ] = [ private_user_role ] trans.app.security_agent.history_set_default_permissions( history, default_permissions ) n_undeleted += 1 - trans.log_event( "History (%s) %d marked as undeleted" % history.name ) + trans.log_event( "History (%s) %d marked as undeleted" % ( history.name, history.id ) ) status = SUCCESS message_parts = [] if n_undeleted:

1 0

[hg] galaxy 2513: Only require access permission to view private...
by Nate Coraor 30 Jul '09

30 Jul '09

details: http://www.bx.psu.edu/hg/galaxy/rev/3574137cf7fb changeset: 2513:3574137cf7fb user: Nate Coraor <nate(a)bx.psu.edu> date: Thu Jul 30 14:13:04 2009 -0400 description: Only require access permission to view private datasets at external sites. 1 file(s) affected in this change: lib/galaxy/web/controllers/dataset.py diffs (12 lines): diff -r 315ac197ff33 -r 3574137cf7fb lib/galaxy/web/controllers/dataset.py --- a/lib/galaxy/web/controllers/dataset.py Thu Jul 30 12:56:16 2009 -0400 +++ b/lib/galaxy/web/controllers/dataset.py Thu Jul 30 14:13:04 2009 -0400 @@ -144,7 +144,7 @@ redirect_url = kwd['redirect_url'] % urllib.quote_plus( kwd['display_url'] ) if trans.app.security_agent.allow_action( None, data.permitted_actions.DATASET_ACCESS, dataset = data ): return trans.response.send_redirect( redirect_url ) # anon access already permitted by rbac - if trans.app.security_agent.allow_action( trans.user, data.permitted_actions.DATASET_MANAGE_PERMISSIONS, dataset = data ): + if trans.app.security_agent.allow_action( trans.user, data.permitted_actions.DATASET_ACCESS, dataset = data ): trans.app.host_security_agent.set_dataset_permissions( data, trans.user, site ) return trans.response.send_redirect( redirect_url ) else:

1 0

[hg] galaxy 2512: Add a new flag -f/--force_retry to cleanup_dat...
by Nate Coraor 30 Jul '09

30 Jul '09

details: http://www.bx.psu.edu/hg/galaxy/rev/315ac197ff33 changeset: 2512:315ac197ff33 user: Dan Blankenberg <dan(a)bx.psu.edu> date: Thu Jul 30 12:56:16 2009 -0400 description: Add a new flag -f/--force_retry to cleanup_datasets.py. This flag will cause the script to attempt to perform the requestion action on objects regardless if it has been performed before. This is useful, i.e. if purge_datasets was called with out using --remove_from_disk, but it is later decided to remove these files: the purge_datasets script should be called with both -r and -f. 1 file(s) affected in this change: scripts/cleanup_datasets/cleanup_datasets.py diffs (162 lines): diff -r ec59d6bcf827 -r 315ac197ff33 scripts/cleanup_datasets/cleanup_datasets.py --- a/scripts/cleanup_datasets/cleanup_datasets.py Thu Jul 30 12:12:26 2009 -0400 +++ b/scripts/cleanup_datasets/cleanup_datasets.py Thu Jul 30 12:56:16 2009 -0400 @@ -24,6 +24,7 @@ parser.add_option( "-d", "--days", dest="days", action="store", type="int", help="number of days (60)", default=60 ) parser.add_option( "-r", "--remove_from_disk", action="store_true", dest="remove_from_disk", help="remove datasets from disk when purged", default=False ) parser.add_option( "-i", "--info_only", action="store_true", dest="info_only", help="info about the requested action", default=False ) + parser.add_option( "-f", "--force_retry", action="store_true", dest="force_retry", help="performs the requested actions, but ignores whether it might have been done before. Useful when -r wasn't used, but should have been", default=False ) parser.add_option( "-1", "--delete_userless_histories", action="store_true", dest="delete_userless_histories", default=False, help="delete userless histories and datasets" ) @@ -73,28 +74,32 @@ print "# Datasets will NOT be removed from disk.\n" if options.delete_userless_histories: - delete_userless_histories( app, cutoff_time, info_only = options.info_only ) + delete_userless_histories( app, cutoff_time, info_only = options.info_only, force_retry = options.force_retry ) elif options.purge_histories: - purge_histories( app, cutoff_time, options.remove_from_disk, info_only = options.info_only ) + purge_histories( app, cutoff_time, options.remove_from_disk, info_only = options.info_only, force_retry = options.force_retry ) elif options.purge_datasets: - purge_datasets( app, cutoff_time, options.remove_from_disk, info_only = options.info_only ) + purge_datasets( app, cutoff_time, options.remove_from_disk, info_only = options.info_only, force_retry = options.force_retry ) elif options.purge_libraries: - purge_libraries( app, cutoff_time, options.remove_from_disk, info_only = options.info_only ) + purge_libraries( app, cutoff_time, options.remove_from_disk, info_only = options.info_only, force_retry = options.force_retry ) elif options.purge_folders: - purge_folders( app, cutoff_time, options.remove_from_disk, info_only = options.info_only ) + purge_folders( app, cutoff_time, options.remove_from_disk, info_only = options.info_only, force_retry = options.force_retry ) sys.exit(0) -def delete_userless_histories( app, cutoff_time, info_only = False ): +def delete_userless_histories( app, cutoff_time, info_only = False, force_retry = False ): # Deletes userless histories whose update_time value is older than the cutoff_time. # The purge history script will handle marking DatasetInstances as deleted. # Nothing is removed from disk yet. history_count = 0 print '# The following datasets and associated userless histories have been deleted' start = time.clock() - histories = app.model.History.filter( and_( app.model.History.table.c.user_id==None, - app.model.History.table.c.deleted==False, - app.model.History.table.c.update_time < cutoff_time ) ).all()# \ + if force_retry: + histories = app.model.History.filter( and_( app.model.History.table.c.user_id==None, + app.model.History.table.c.update_time < cutoff_time ) ).all() + else: + histories = app.model.History.filter( and_( app.model.History.table.c.user_id==None, + app.model.History.table.c.deleted==False, + app.model.History.table.c.update_time < cutoff_time ) ).all() for history in histories: if not info_only: history.deleted = True @@ -106,7 +111,7 @@ print "Elapsed time: ", stop - start, "\n" -def purge_histories( app, cutoff_time, remove_from_disk, info_only = False ): +def purge_histories( app, cutoff_time, remove_from_disk, info_only = False, force_retry = False ): # Purges deleted histories whose update_time is older than the cutoff_time. # The dataset associations of each history are also marked as deleted. # The Purge Dataset method will purge each Dataset as necessary @@ -115,10 +120,15 @@ history_count = 0 print '# The following datasets and associated deleted histories have been purged' start = time.clock() - histories = app.model.History.filter( and_( app.model.History.table.c.deleted==True, - app.model.History.table.c.purged==False, - app.model.History.table.c.update_time < cutoff_time ) ) \ - .options( eagerload( 'datasets' ) ).all() + if force_retry: + histories = app.model.History.filter( and_( app.model.History.table.c.deleted==True, + app.model.History.table.c.update_time < cutoff_time ) ) \ + .options( eagerload( 'datasets' ) ).all() + else: + histories = app.model.History.filter( and_( app.model.History.table.c.deleted==True, + app.model.History.table.c.purged==False, + app.model.History.table.c.update_time < cutoff_time ) ) \ + .options( eagerload( 'datasets' ) ).all() for history in histories: for dataset_assoc in history.datasets: _purge_dataset_instance( dataset_assoc, app, remove_from_disk, info_only = info_only ) #mark a DatasetInstance as deleted, clear associated files, and mark the Dataset as deleted if it is deletable @@ -136,7 +146,7 @@ print '# Purged %d histories.' % ( history_count ), '\n' print "Elapsed time: ", stop - start, "\n" -def purge_libraries( app, cutoff_time, remove_from_disk, info_only = False ): +def purge_libraries( app, cutoff_time, remove_from_disk, info_only = False, force_retry = False ): # Purges deleted libraries whose update_time is older than the cutoff_time. # The dataset associations of each library are also marked as deleted. # The Purge Dataset method will purge each Dataset as necessary @@ -145,9 +155,13 @@ library_count = 0 print '# The following libraries and associated folders have been purged' start = time.clock() - libraries = app.model.Library.filter( and_( app.model.Library.table.c.deleted==True, - app.model.Library.table.c.purged==False, - app.model.Library.table.c.update_time < cutoff_time ) ).all() + if force_retry: + libraries = app.model.Library.filter( and_( app.model.Library.table.c.deleted==True, + app.model.Library.table.c.update_time < cutoff_time ) ).all() + else: + libraries = app.model.Library.filter( and_( app.model.Library.table.c.deleted==True, + app.model.Library.table.c.purged==False, + app.model.Library.table.c.update_time < cutoff_time ) ).all() for library in libraries: _purge_folder( library.root_folder, app, remove_from_disk, info_only = info_only ) if not info_only: @@ -159,7 +173,7 @@ print '# Purged %d libraries .' % ( library_count ), '\n' print "Elapsed time: ", stop - start, "\n" -def purge_folders( app, cutoff_time, remove_from_disk, info_only = False ): +def purge_folders( app, cutoff_time, remove_from_disk, info_only = False, force_retry = False ): # Purges deleted folders whose update_time is older than the cutoff_time. # The dataset associations of each folder are also marked as deleted. # The Purge Dataset method will purge each Dataset as necessary @@ -168,9 +182,13 @@ folder_count = 0 print '# The following folders have been purged' start = time.clock() - folders = app.model.LibraryFolder.filter( and_( app.model.LibraryFolder.table.c.deleted==True, - app.model.LibraryFolder.table.c.purged==False, - app.model.LibraryFolder.table.c.update_time < cutoff_time ) ).all() + if force_retry: + folders = app.model.LibraryFolder.filter( and_( app.model.LibraryFolder.table.c.deleted==True, + app.model.LibraryFolder.table.c.update_time < cutoff_time ) ).all() + else: + folders = app.model.LibraryFolder.filter( and_( app.model.LibraryFolder.table.c.deleted==True, + app.model.LibraryFolder.table.c.purged==False, + app.model.LibraryFolder.table.c.update_time < cutoff_time ) ).all() for folder in folders: _purge_folder( folder, app, remove_from_disk, info_only = info_only ) print "%d" % folder.id @@ -179,17 +197,22 @@ print '# Purged %d folders.' % ( folder_count ), '\n' print "Elapsed time: ", stop - start, "\n" -def purge_datasets( app, cutoff_time, remove_from_disk, info_only = False ): +def purge_datasets( app, cutoff_time, remove_from_disk, info_only = False, repurge = False, force_retry = False ): # Purges deleted datasets whose update_time is older than cutoff_time. Files may or may # not be removed from disk. dataset_count = 0 disk_space = 0 print '# The following deleted datasets have been purged' start = time.clock() - datasets = app.model.Dataset.filter( and_( app.model.Dataset.table.c.deleted==True, - app.model.Dataset.table.c.purgable==True, - app.model.Dataset.table.c.purged==False, - app.model.Dataset.table.c.update_time < cutoff_time ) ).all() + if force_retry: + datasets = app.model.Dataset.filter( and_( app.model.Dataset.table.c.deleted==True, + app.model.Dataset.table.c.purgable==True, + app.model.Dataset.table.c.update_time < cutoff_time ) ).all() + else: + datasets = app.model.Dataset.filter( and_( app.model.Dataset.table.c.deleted==True, + app.model.Dataset.table.c.purgable==True, + app.model.Dataset.table.c.purged==False, + app.model.Dataset.table.c.update_time < cutoff_time ) ).all() for dataset in datasets: file_size = dataset.file_size _purge_dataset( dataset, remove_from_disk, info_only = info_only )

1 0

[hg] galaxy 2510: Added fastqsanger data format and required bwa...
by Nate Coraor 30 Jul '09

30 Jul '09

details: http://www.bx.psu.edu/hg/galaxy/rev/fab59b1e756d changeset: 2510:fab59b1e756d user: Kelly Vincent <kpvincent(a)bx.psu.edu> date: Thu Jul 30 12:06:24 2009 -0400 description: Added fastqsanger data format and required bwa_wrapper to take only that format as input 9 file(s) affected in this change: datatypes_conf.xml.sample lib/galaxy/datatypes/registry.py lib/galaxy/datatypes/sequence.py lib/galaxy/datatypes/test/1.fastqsanger lib/galaxy/datatypes/test/2.fastqsanger test-data/1.fastqsanger test-data/2.fastqsanger test/functional/test_sniffing_and_metadata_settings.py tools/sr_mapping/bwa_wrapper.xml diffs (298 lines): diff -r f06777cbd5bb -r fab59b1e756d datatypes_conf.xml.sample --- a/datatypes_conf.xml.sample Thu Jul 30 11:05:03 2009 -0400 +++ b/datatypes_conf.xml.sample Thu Jul 30 12:06:24 2009 -0400 @@ -20,6 +20,7 @@ <datatype extension="fasta" type="galaxy.datatypes.sequence:Fasta" display_in_upload="true"> <converter file="fasta_to_tabular_converter.xml" target_datatype="tabular"/> </datatype> + <datatype extension="fastqsanger" type="galaxy.datatypes.sequence:FastqSanger" display_in_upload="true"/> <datatype extension="fastqsolexa" type="galaxy.datatypes.sequence:FastqSolexa" display_in_upload="true"> <converter file="fastqsolexa_to_fasta_converter.xml" target_datatype="fasta"/> <converter file="fastqsolexa_to_qual_converter.xml" target_datatype="qualsolexa"/> @@ -195,6 +196,7 @@ <sniffer type="galaxy.datatypes.qualityscore:QualityScore454"/> <sniffer type="galaxy.datatypes.sequence:Fasta"/> <sniffer type="galaxy.datatypes.sequence:FastqSolexa"/> + <sniffer type="galaxy.datatypes.sequence:FastqSanger"/> <sniffer type="galaxy.datatypes.interval:Wiggle"/> <sniffer type="galaxy.datatypes.images:Html"/> <sniffer type="galaxy.datatypes.sequence:Axt"/> diff -r f06777cbd5bb -r fab59b1e756d lib/galaxy/datatypes/registry.py --- a/lib/galaxy/datatypes/registry.py Thu Jul 30 11:05:03 2009 -0400 +++ b/lib/galaxy/datatypes/registry.py Thu Jul 30 12:06:24 2009 -0400 @@ -117,6 +117,7 @@ 'customtrack' : interval.CustomTrack(), 'csfasta' : sequence.csFasta(), 'fasta' : sequence.Fasta(), + 'fastqsanger' : sequence.FastqSanger(), 'fastqsolexa' : sequence.FastqSolexa(), 'gff' : interval.Gff(), 'gff3' : interval.Gff3(), @@ -144,6 +145,7 @@ 'customtrack' : 'text/plain', 'csfasta' : 'text/plain', 'fasta' : 'text/plain', + 'fastqsanger' : 'text/plain', 'fastqsolexa' : 'text/plain', 'gff' : 'text/plain', 'gff3' : 'text/plain', @@ -173,6 +175,7 @@ qualityscore.QualityScore454(), sequence.Fasta(), sequence.FastqSolexa(), + sequence.FastqSanger(), interval.Wiggle(), images.Html(), sequence.Axt(), diff -r f06777cbd5bb -r fab59b1e756d lib/galaxy/datatypes/sequence.py --- a/lib/galaxy/datatypes/sequence.py Thu Jul 30 11:05:03 2009 -0400 +++ b/lib/galaxy/datatypes/sequence.py Thu Jul 30 12:06:24 2009 -0400 @@ -176,16 +176,139 @@ except: qscore_int = False + # check length and range of quality scores if qscore_int: if len( headers[3] ) != len( headers[1][0] ): return False + if not self.check_qual_values_within_range(headers[3], 'int'): + return False + try: + if not self.check_qual_values_within_range(headers[7], 'int'): + return False + try: + if not self.check_qual_values_within_range(headers[11], 'int'): + return False + except IndexError: + pass + except IndexError: + pass else: if len( headers[3][0] ) != len( headers[1][0] ): - return False + return False + if not self.check_qual_values_within_range(headers[3][0], 'char'): + return False + try: + if not self.check_qual_values_within_range(headers[7][0], 'char'): + return False + try: + if not self.check_qual_values_within_range(headers[11][0], 'char'): + return False + except IndexError: + pass + except IndexError: + pass return True return False except: return False + def check_qual_values_within_range( self, qual_seq, score_type ): + if score_type == 'char': + for val in qual_seq: + if ord(val) < 59 or ord(val) > 104: + return False + elif score_type == 'int': + for val in qual_seq: + if int(val) < -5 or int(val) > 40: + return False + return True + + +class FastqSanger( Sequence ): + """Class representing a FASTQ sequence ( the Sanger variant )""" + file_ext = "fastqsanger" + + def set_peek( self, dataset ): + if not dataset.dataset.purged: + dataset.peek = data.get_file_peek( dataset.file_name ) + dataset.blurb = data.nice_size( dataset.get_size() ) + else: + dataset.peek = 'file does not exist' + dataset.blurb = 'file purged from disk' + + def sniff( self, filename ): + """ + Determines whether the file is in fastqsanger format (Sanger Variant) + For details, see http://maq.sourceforge.net/fastq.shtml + + Note: There are two kinds of FASTQ files, known as "Sanger" (sometimes called "Standard") and Solexa + These differ in the representation of the quality scores + + >>> fname = get_test_fname( '1.fastqsanger' ) + >>> FastqSanger().sniff( fname ) + True + >>> fname = get_test_fname( '2.fastqsanger' ) + >>> FastqSanger().sniff( fname ) + True + """ + headers = get_headers( filename, None ) + bases_regexp = re.compile( "^[NGTAC]*$" ) + try: + if len( headers ) >= 4 and headers[0][0] and headers[0][0][0] == "@" and headers[2][0] and headers[2][0][0] == "+" and headers[1][0]: + # Check the sequence line, make sure it contains only G/C/A/T/N + if not bases_regexp.match( headers[1][0] ): + return False + # Check quality score: integer or ascii char. + try: + check = int(headers[3][0]) + qscore_int = True + except: + qscore_int = False + + # check length and range of quality scores + if qscore_int: + if len( headers[3] ) != len( headers[1][0] ): + return False + if not self.check_qual_values_within_range(headers[3], 'int'): + return False + try: + if not self.check_qual_values_within_range(headers[7], 'int'): + return False + try: + if not self.check_qual_values_within_range(headers[11], 'int'): + return False + except IndexError: + pass + except IndexError: + pass + else: + if len( headers[3][0] ) != len( headers[1][0] ): + return False + if not self.check_qual_values_within_range(headers[3][0], 'char'): + return False + try: + if not self.check_qual_values_within_range(headers[7][0], 'char'): + return False + try: + if not self.check_qual_values_within_range(headers[11][0], 'char'): + return False + except IndexError: + pass + except IndexError: + pass + return True + return False + except: + return False + def check_qual_values_within_range( self, qual_seq, score_type ): + if score_type == 'char': + for val in qual_seq: + if ord(val) >= 33 and ord(val) <= 126: + return True + elif score_type == 'int': + for val in qual_seq: + if int(val) >= 0 and int(val) <= 93: + return True + return False try: from galaxy import eggs diff -r f06777cbd5bb -r fab59b1e756d lib/galaxy/datatypes/test/1.fastqsanger --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/lib/galaxy/datatypes/test/1.fastqsanger Thu Jul 30 12:06:24 2009 -0400 @@ -0,0 +1,8 @@ +@1831_573_1004/1 +AATACTTTCGGCGCCCTAAACCAGCTCACTGGGG ++ +><C&&9952+C>5<.?<79,=42<292:<(9/-7 +@1831_573_1050/1 +TTTATGGGTATGGCCGCTCACAGGCCAGCGGCCT ++ +;@@17?@=>7??@A8?==@4A?A4)&+.'&+'1, \ No newline at end of file diff -r f06777cbd5bb -r fab59b1e756d lib/galaxy/datatypes/test/2.fastqsanger --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/lib/galaxy/datatypes/test/2.fastqsanger Thu Jul 30 12:06:24 2009 -0400 @@ -0,0 +1,8 @@ +@1831_573_1004/1 +AATACTTTCGGCGCCCTAAACCAGCTCACTGGGG ++ +29 27 34 5 5 24 24 20 17 10 34 29 20 27 13 30 27 22 24 11 28 19 17 27 17 24 17 25 27 7 24 14 12 22 +@1831_573_1050/1 +TTTATGGGTATGGCCGCTCACAGGCCAGCGGCCT ++ +26 31 31 16 22 30 31 28 29 22 30 30 31 32 23 30 28 28 31 19 32 30 32 19 8 5 10 13 6 5 10 6 16 11 \ No newline at end of file diff -r f06777cbd5bb -r fab59b1e756d test-data/1.fastqsanger --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/1.fastqsanger Thu Jul 30 12:06:24 2009 -0400 @@ -0,0 +1,8 @@ +@1831_573_1004/1 +AATACTTTCGGCGCCCTAAACCAGCTCACTGGGG ++ +><C&&9952+C>5<.?<79,=42<292:<(9/-7 +@1831_573_1050/1 +TTTATGGGTATGGCCGCTCACAGGCCAGCGGCCT ++ +;@@17?@=>7??@A8?==@4A?A4)&+.'&+'1, \ No newline at end of file diff -r f06777cbd5bb -r fab59b1e756d test-data/2.fastqsanger --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/2.fastqsanger Thu Jul 30 12:06:24 2009 -0400 @@ -0,0 +1,8 @@ +@1831_573_1004/1 +AATACTTTCGGCGCCCTAAACCAGCTCACTGGGG ++ +29 27 34 5 5 24 24 20 17 10 34 29 20 27 13 30 27 22 24 11 28 19 17 27 17 24 17 25 27 7 24 14 12 22 +@1831_573_1050/1 +TTTATGGGTATGGCCGCTCACAGGCCAGCGGCCT ++ +26 31 31 16 22 30 31 28 29 22 30 30 31 32 23 30 28 28 31 19 32 30 32 19 8 5 10 13 6 5 10 6 16 11 \ No newline at end of file diff -r f06777cbd5bb -r fab59b1e756d test/functional/test_sniffing_and_metadata_settings.py --- a/test/functional/test_sniffing_and_metadata_settings.py Thu Jul 30 11:05:03 2009 -0400 +++ b/test/functional/test_sniffing_and_metadata_settings.py Thu Jul 30 12:06:24 2009 -0400 @@ -216,6 +216,16 @@ assert latest_hda is not None, "Problem retrieving wig hda from the database" if not latest_hda.name == '1.wig' and not latest_hda.extension == 'wig': raise AssertionError, "wig data type was not correctly sniffed." + def test_085_fastqsanger_datatype( self ): + """Testing correctly sniffing fastqsanger ( the Sanger variant ) data type upon upload""" + self.upload_file( '1.fastqsanger' ) + self.verify_dataset_correctness( '1.fastqsanger' ) + self.check_history_for_string( '1.fastqsanger format: <span class="fastqsanger">fastqsanger</span>, database: \? Info: uploaded fastqsanger file' ) + latest_hda = galaxy.model.HistoryDatasetAssociation.query() \ + .order_by( desc( galaxy.model.HistoryDatasetAssociation.table.c.create_time ) ).first() + assert latest_hda is not None, "Problem retrieving fastqsanger hda from the database" + if not latest_hda.name == '1.fastqsanger' and not latest_hda.extension == 'fastqsanger': + raise AssertionError, "fastqsanger data type was not correctly sniffed." def test_9999_clean_up( self ): self.delete_history( id=self.security.encode_id( history1.id ) ) self.logout() diff -r f06777cbd5bb -r fab59b1e756d tools/sr_mapping/bwa_wrapper.xml --- a/tools/sr_mapping/bwa_wrapper.xml Thu Jul 30 11:05:03 2009 -0400 +++ b/tools/sr_mapping/bwa_wrapper.xml Thu Jul 30 12:06:24 2009 -0400 @@ -71,7 +71,7 @@ <option value="history">Use one from the history</option> </param> <when value="history"> - <param name="ownFile" type="data" label="Select a reference genome" /> + <param name="ownFile" type="data" format="fasta" label="Select a reference genome" /> </when> <when value="indexed"> <param name="indices" type="select" label="Select a reference genome"> @@ -91,7 +91,7 @@ <option value="history">Use one from the history</option> </param> <when value="history"> - <param name="ownFile" type="data" label="Select a reference genome" /> + <param name="ownFile" type="data" format="fasta" label="Select a reference genome" /> </when> <when value="indexed"> <param name="indices" type="select" label="Select a reference genome"> @@ -111,11 +111,11 @@ <option value="paired">Paired-end</option> </param> <when value="single"> - <param name="input1" type="data" label="FASTQ file" /> + <param name="input1" type="data" format="fastqsanger" label="FASTQ file" /> </when> <when value="paired"> - <param name="input1" type="data" label="Forward FASTQ file" /> - <param name="input2" type="data" label="Reverse FASTQ file" /> + <param name="input1" type="data" format="fastqsanger" label="Forward FASTQ file" /> + <param name="input2" type="data" format="fastqsanger" label="Reverse FASTQ file" /> </when> </conditional> <conditional name="params">

1 0