I have created a card for this: https://trello.com/card/external-display-application-enhancements/506338ce32... Here is the follow up conversation that answers all of the questions from my previous e-mail. (09:11:17 AM) natefoo: jmchilton1: i'm not sure that the displayonly REMOTE_USER was ever actually added to the documentation. (09:14:46 AM) jmchilton1: natefoo: I thought you said yesterday it was? I must have misunderstood you, sorry about that. (09:15:34 AM) natefoo: no, i think i was the one who misunderstood. (09:15:55 AM) dannon: val_erie: Yep (09:16:00 AM) natefoo: there's stuff for bypassing the proxy, but the part about adding the header wasn't there. (09:16:24 AM) natefoo: because the logic to bypass authentication used to be in remoteuser.py. (09:16:34 AM) natefoo: actually there's still stuff there, maybe the right routes aren't listed. (09:16:52 AM) natefoo: you have to set a list of servers that are allowed to bypass auth in universe_wsgi.ini. (09:17:38 AM) jmchilton1: Doesn't something like IGV require you to allow all servers through? (09:18:26 AM) jmchilton1: We have a couple different setups, but the most recently created one I just allow anything access to /display_application assuming the user and dataset hashes are security enough. (09:18:35 AM) jmchilton1: Am I wrong about that? (09:18:49 AM) natefoo: `display_servers` in the config file. (09:18:56 AM) natefoo: i'm not familiar enough with igv. (09:19:08 AM) natefoo: it depends on what's fetching the data in to igv. (09:19:14 AM) jmchilton1: End user is from a Java app (09:19:38 AM) val_erie: dannon, I used pip install rpy and got 'Downloading/unpacking rpy. Could not find any downloads that satisfy the requirement rpy' (09:19:57 AM) dannon: Was worried about that. rpy2 would be available, but not rpy (09:20:17 AM) natefoo: hosted or desktop? (09:20:56 AM) jmchilton1: desktop (09:20:59 AM) dblank: IGV is java web launch, i would presume the request comes from the ip associated with the user's computer (09:21:10 AM) jmchilton1: right (09:21:30 AM) natefoo: ah. (09:22:18 AM) jmchilton1: In our older genomics setup, I needed to completely open thoses paths for IGV which resulted in really specific access rules and then I realized they aren't really providing any extra security by allowing only IGV through, so I just have started opening up all of display_aplication (09:23:58 AM) natefoo: i'm certain i am not up on how display_application works now, we're doing authentication via hashes in the dataset url? (09:26:09 AM) dblank: The idea is that the hashes and user security should be enough for /display_application. However, if a dataset is public and you know it's ordinary hashed hda.id, you can get at it; private datasets will be restricted based upon the user that clicked/generated the link (09:26:12 AM) dannon: val_erie: I'd probably just try to install it manually in your virtualenv if I were you. (09:27:06 AM) dannon: http://rpy.sourceforge.net/rpy_download.html, and add it to <your_virtual_env>/lib/ (09:28:00 AM) natefoo: oh so the hash we're referring to is just the hashed hda id? (09:28:27 AM) jmchilton1: dblank: So it is your opinion that just allowing everything through on /display_application at the proxy level is fine then? (09:28:50 AM) dblank: theres multiple hashed datas, one for the hdaid and another for authenticating (09:29:49 AM) jmchilton1: natefoo: Back to your original question, I have checked we are not setting display_servers on any of our instances. (09:29:50 AM) natefoo: okay. (09:30:09 AM) natefoo: jmchilton1: well in the case of igv, display_servers wouldn't help. (09:30:29 AM) jmchilton1: display_servers = * :) (09:30:45 AM) jmchilton1: Which I guess is what the displayonly hack is doing (09:31:04 AM) natefoo: fairly close. (09:31:20 AM) dblank: yes, as long as you are ok with non-permission restricted data being accessible by anyone with the hashed hda id (09:32:39 AM) jmchilton1: dblank: Right, but those hashes are secure right. They are equivalent to like an API key right. No one has any reservations about opening /api to world. (All of this assumes SSL to some degree). (09:33:22 AM) val_erie: dannon, I downloaded rpy-1.0.3 from http://sourceforge.net/. Do I add the whole rpy directory to galaxy-dist/venv/bin/lib? I do not need to go through all steps of installation such configuring path to R library that is documented in README file that came with rpy? any (09:33:25 AM) mrscribe: Title: SourceForge.net: Find and Build Open Source Software (at sourceforge.net) (09:33:47 AM) dblank: yes, that is the idea ;) (09:34:01 AM) natefoo: val_erie: you have to run galaxy-dist/venv/bin/python setup.py install (09:34:01 AM) jmchilton1: My e-mail holds then, you guys should rework these to there own URL (distinct from browser based accesses) and then the galaxy "recommended" approach should just be to open that to the world. (09:34:15 AM) natefoo: jmchilton1: i agree. (09:34:22 AM) jmchilton1: It would vastly simplify all of this complexity admins have to deal with. (09:36:04 AM) dannon: val_erie: You'll need to follow the build instructions, just install (or link) the installed rpy python package into your virtualenv when done. (09:36:25 AM) jmchilton1: That route and /api could actually have the same configuration actually. It would be wonderful. (09:36:49 AM) dannon: I haven't tried to do this before, though, so you'll have to tinker with it. (09:37:11 AM) natefoo: i'd think that the method could simply be an api method. (09:37:12 AM) dannon: nate/someone may have better advice having potentially tried it :) (09:37:38 AM) jmchilton1: Thought about that, but do you really want to give your API key to a third party application? (09:37:42 AM) natefoo: dannon: val_erie: i'd install with the venv's python, which should take care of putting things in the right place. (09:38:06 AM) natefoo: well the api is accessible without a key now. (09:38:25 AM) jmchilton1: I guess you always need to give your API key to a third party application. (09:39:19 AM) natefoo: ideally we'd be using single-use keys for this, which is sorta what i'd designed when i did the temporary authz system for the old-style applications. (09:50:34 AM) jmchilton1: Agreed, to some degree that is exactly what the dataset hash will be though. It is a key that has the single use of giving you read access to that one dataset. (09:51:53 AM) natefoo: but it gives anyone with that key access and cannot be invalidated. (09:53:59 AM) jmchilton1: :) I said to some degree. Sorry for all the spam, -John On Thu, Apr 18, 2013 at 8:10 AM, John Chilton <chilton@msi.umn.edu> wrote:
This message is a continuation of a conversation from IRC, but it is too long for that medium. This is probably only directly meant for 3 or 4 people, but I sending this out wide and attaching the IRC conversation from yesterday because setting up and maintaining third party display applications is very confusing and difficult (JJ and I have spent way more time than we would care to admit trying to get these things to work and keep them working) and so the more search results that come up when you Google these things the better.
My follow ups: 1) When that remote_user check was being skipped yesterday, my setup should have looked like one without external auth and it did not work. Makes me think something like IGV probably doesn't work against central in the non-remote user case right now. Just conjecture but you may want to test it.
2) Based on dblank's comments I am starting to understand the difference between /dataset/display_application and /display_application. It is still confusing that they are so close. I would recommend changing one or the other routes for clarity.
3) I think it is important to have different URLs for when the URL is meant for consumption by the web browser (when the expectation is that the user will have session information) and for when it will be consumed by a Java application (IGV) or third party web app (ProtVis, etc...) which will not have the users session and the displayonly REMOTE_USER hack must be employed. It would cleanup proxy configuration and give admins a clearer sense of what traffic is coming from where.
Combining 2 + 3) maybe keep dataset/display_application as is but change /display_application to
/prettified_browser_access /prettified_third_party_access
4) The displayonly REMOTE_USER hack is no longer on the wiki but it is still needed I think (right?). Why was it removed?
5) A long term goal, maybe should be getting away from the displayonly REMOTE_USER hack altogether. For /prettified_third_party_access routes Galaxy could setup a dummy user internally or just not require a user be set at all.
Thanks, -John
The IRC conversation:
(09:04:08 AM) jmchilton1: Ummm... are you guys no longer redirecting to /display_application URLs, are these all going to /dataset? Any ideas how is that going to work with REMOTE_USER? (09:04:11 AM) dannon: Ahh, ok, so I did understand the error. This request won't (always) have a session. (09:13:15 AM) dannon: Ok, will have a fix out for this in a second. (09:22:55 AM) guerler left the room (quit: Quit: guerler). (09:23:49 AM) dannon: And, I'm not informed on display_application urls, Dan or Carl would be the best bet there. (09:23:52 AM) jmchilton1: Looking through my logs I guess that last question doesn't make sense. (09:24:06 AM) dannon: Made enough sense for me to say "NOT IT!" :) (09:24:36 AM) dannon: Just testing my fix to make sure it doesn't break anything else. (09:24:55 AM) jmchilton1: :) I think the problem is related to the session stuff though. display_application/blah -> 302 to /dataset/blah use to work and now redirects the same but the display application doesn't have access to the data. (09:25:25 AM) dannon: Ahh, that could be the case of the dataset api is relying on session being available. (09:25:46 AM) jmchilton1: It wasn't the API though, it was the normal controller. (09:26:27 AM) dannon: Ok, so how do I test this? (09:26:38 AM) jmchilton1: I have no clue. (09:26:52 AM) jmchilton1: I wish I understood how it was working in the first place (09:27:18 AM) dannon: Me too, I've never touched it. (09:29:07 AM) jmchilton1: It is all complicated greatly by the fact that nginx/apache need to be hacked for the display applications. Hmmm... I will keep poking at this and let you know if I discover anything. (09:29:27 AM) dannon: Ok, sounds good. In the meanwhile, do you have a script that just uploads a simple file via library api? (09:29:43 AM) dannon: Trying to test it and I remember how much of a pain this was now. (09:30:15 AM) dannon: Actually, nm, I have one. wtach folder will do it (09:30:20 AM) jmchilton1: I do not have a script. (09:30:24 AM) jmchilton1: Great. (09:40:50 AM) dannon: k, seems to work for me now, let me know if that doesn't fix it. (10:20:13 AM) guerler [~user@wireless-aca-ndb-a.nat.emory.edu] entered the room. (10:58:56 AM) clements [~clements@71-220-230-96.eugn.qwest.net] entered the room. (11:08:23 AM) botton [~willie@router.isis.poly.edu] entered the room. (11:08:23 AM) botton left the room. (11:26:13 AM) ceberhard [~carleberh@107.194.88.162] entered the room. (12:12:41 PM) jmchilton1: http://pastebin.com/07EikX6q (12:12:43 PM) mrscribe: Title: AttributeError: 'NoneType' object has no attribute 'deleted' - Pastebin.com (at pastebin.com) (12:13:41 PM) jmchilton1: dannon: It looks like you broke my display application :). (12:13:47 PM) dannon: Aww. (12:14:34 PM) jmchilton1: Do I just add another check that session has user before checking if user is deleted? (12:15:14 PM) dannon: I need to see how display applications work and why session isn't being associated. (12:15:17 PM) dannon: But maybe. (12:15:34 PM) jmchilton1: The display action doesn't act as the user, it doesn't have that users session information (12:15:54 PM) jmchilton1: That might also be nonsense (12:16:25 PM) jmchilton1: There is this hack you need to do though to set a display only user at the proxy level for display application accesses. (12:16:44 PM) jmchilton1: I will just try that hack for now and let you know how it goes. (12:16:45 PM) dannon: Ahh (12:16:58 PM) dannon: So it was a hack to get them to work before, and I broke the hack? (12:17:30 PM) dannon: If so, I feel less bad now, but we should still be able to get this to work in a non-hacky way. (12:17:34 PM) jmchilton1: I don't think MSI is the only one doing this. (12:17:48 PM) jmchilton1: Good luck with that :) (12:18:20 PM) jmchilton1: It is kind of a tough problem really, any way you slice it I think there is some complexity between the proxy and the display applications that is going to seem hacky. (12:18:42 PM) jmchilton1: Would love to be proven wrong though, this has really been a large headache for us over the years :). (12:19:20 PM) dannon: If you can explain how it worked before, I can probably at least translate that, just let me know what I can do. (12:19:39 PM) jmchilton1: Here, let me see if I can find the wiki page about it. (12:21:09 PM) jmchilton1: Maybe this is an MSI hack, I am not seeing anything about it on the wiki page. (12:21:59 PM) jmchilton1: I will keep thinking about this and send you documentation if I can find some. (12:22:13 PM) dannon: Ok, awesome. If it's broken, we can fix it, so just let me know. (12:22:20 PM) dannon: We have the technology! (12:27:42 PM) natefoo: pretty sure i had that in the apache page. (12:28:05 PM) jmchilton1: http://gmod.827538.n3.nabble.com/galaxy-dev-ldap-integration-td839409.html (12:28:27 PM) jmchilton1: Ry4an descirbe the displayonly hack somewhere in that thread. (12:32:31 PM) jmchilton1: Also I am noticing a bunch of new histories are being created automatically now. Oh how I wish I would have gotten that last merge right, I am tired of tracking central in production :(. (12:33:10 PM) dannon: Under what circumstances are new histories being created? (12:35:21 PM) jmchilton1: I don't know, it is related to the display application viewing though I think. (12:36:26 PM) dannon: Argh, ok. Sounds like it's time for me to learn how display applications should really work. (12:39:15 PM) jmchilton1: Wait, the histories was probably my mistake. (12:39:45 PM) jmchilton1: dannon: I wouldn't spend to much time on this until we have ruled out John's incompetence. (12:40:05 PM) dannon: Haha, ok. I'm going to play with sqlalchemy for a little bit, then. (01:04:22 PM) jmchilton1: Okay I made some misjudgments with the history change and stuff. But I am 43% confident there was a change related to this stuff that broke display applications and it is not just me. (01:04:24 PM) jmchilton1: _ensure_logged_in_user is now redirecting /display_application/blah.... to /root, I don't think it could have been doing that before. (01:05:11 PM) jmchilton1: I am confident those requests are being redicted, it is an assumption that _ensure_logged_in_user is the method redirecting (01:06:00 PM) dannon: Ok, let me look again at what might have changed that redirect. (01:07:04 PM) jmchilton1: Seems like according to blame, nothing in that function seems like it changed, I don't think _ensure_logged_in_user was called for these requests previously (01:07:14 PM) jmchilton1: I could be wrong again, that is just my first impression. (01:14:03 PM) dannon: Except, if I understand the displayonly user hack, it does create a galaxy_session and would have galaxy_session.user, so wouldn't redirect. (01:14:35 PM) dannon: So the change is that somehow, now, it isn't creating/associating that user and/or creating a valid session. (01:14:55 PM) jmchilton1: Is there a debug statement I can add to figure out which it is? (01:16:11 PM) jmchilton1: I don't understand how the REMOTE_USER is still set after the next request. (01:16:26 PM) dannon: Sure, just log the value of self.environ['HTTP_REMOTE_USER'], self.galaxy_session and galaxy_session.user in _ensure_logged_in_user (01:16:27 PM) jmchilton1: That is irrespecitve of these changes though. (01:19:21 PM) dannon: This would be easier to work with if I had a good test setup with remote user here, working on that now. (01:24:43 PM) jmchilton1: There has to be a second non-dannon bug conflating this, because the display application is attempting to download a /dataset/display_application URL not a /display_application url. When I manually attempt to download the /display_application url though I get a remote user in env and session, but user is None. (01:27:46 PM) dannon: I don't know if I feel better or worse about that. (01:27:59 PM) jmchilton1: Also, it is the same controller (01:28:21 PM) jmchilton1: The distinction must just be so that you can set remote_user on the /display_application version (01:28:43 PM) jmchilton1: I can see why that might change accidentally, it is not too obvious (01:42:02 PM) jmchilton1: Damnit! I am coming back around to thinking this is my own incompetence again. dannon, you should probably just pretend like you are not around for the rest of the day and see I have figured this out by tomorrow morning. (01:43:04 PM) dannon: I can do that, but then if it was my fault I'm going to feel bad. (01:43:25 PM) jmchilton1: Better than you wasting your day when it was my fault. (01:44:58 PM) jmchilton1: I should say more of your day :). (01:46:14 PM) dannon: Heh, ok, works for me. I'm going to do other things for a while. (02:27:32 PM) acu_ [~acu@24-159-215-150.static.roch.mn.charter.com] entered the room. (02:33:17 PM) natefoo left the room (quit: Read error: Operation timed out). (02:33:34 PM) natefoo [~nate@victory.bx.psu.edu] entered the room. (02:43:01 PM) acu_: can tool panel be customized differently for each user , that is each user would see different set of tools? (03:07:31 PM) bag [~bag@HSI-KBW-046-005-177-036.hsi8.kabel-badenwuerttemberg.de] entered the room. (03:12:03 PM) acu_ left the room (quit: Quit: Leaving). (03:12:16 PM) acu_ [~acu@24-159-215-150.static.roch.mn.charter.com] entered the room. (03:55:33 PM) jmchilton1: Is it possible that the url_for on line 43 of datatypes/display_applications/application.py use to generate a /display_application url but the change to mapper.explicit means it now generates a dataset/display_application url? The magic /display_application url is on line 57 of buildapp.py? (03:56:39 PM) jmchilton1: If so is there a clear way to force the /display_application version of the URL? (03:59:36 PM) ceberhard: jmchilton1: there are two styles of display applications. The newer version (AFAIK) is the only one of the two that produce the shorter, cleaner urls. (03:59:54 PM) jmchilton1: Is this not the newer style? (04:00:22 PM) ceberhard: Checking it out now. (04:06:22 PM) ceberhard: You're seeing the longer, verbose urls on these links? (04:06:51 PM) jmchilton1: Yes, I think so. (04:07:31 PM) jmchilton1: I have an interesting setup, so I worry it might be a problem on my end. But the goal is for this code to produce the shorter link? (04:09:04 PM) ceberhard: Yes. It successfully does this in web.base.controller (for the api) by removing the absolute slash before 'dataset'. It may be that a better fix would be to change the mapper to accept the absolute url instead. (04:09:10 PM) ceberhard: Let me try that. (04:11:24 PM) dannon: The leading slash disables route memory (which (04:11:34 PM) dannon: is the default now) (04:11:55 PM) dannon: Well. I should say, it's the only option now, with the removal of mapper.explicit == False (04:12:39 PM) ceberhard: So the better fix would be to remove the redundant slash in application.py, then. (04:12:59 PM) dannon: Unless there's hidden magic somewhere I'm not aware of and it's actually doing something. Doesn't *hurt* anything, though. (04:13:28 PM) ceberhard: Seems to still work in the version used in the api. (04:13:44 PM) dannon: So using the slash (or not) changes the url? (04:13:47 PM) dannon: And, how so? (04:14:13 PM) dannon: Or are you just saying that the display application links are working in the API? (04:14:37 PM) ceberhard: (AFAIK) There's a mapper for display applications in buildapp that seems to create a cleaner url. (04:14:49 PM) ceberhard: Yes. The api versions work. (04:15:27 PM) ceberhard: The line 57 in buildapp John mentioned. (04:15:51 PM) ceberhard: Not a mapper - a route. (04:17:08 PM) ceberhard: I found in the API that when it's called with a leading slash on the controller it isn't found and generates a longer url. (04:18:19 PM) dannon: Ok, so route memory does still work. (04:18:48 PM) dannon: The slash disables it, so it doesn't find the pretty route, and goes oldschool with controller/action/etc (04:19:00 PM) ceberhard: Makes sense. (04:19:13 PM) dannon: So now the question is, should that longer link ever be used? (04:20:07 PM) acu_ left the room (quit: Quit: Leaving). (04:20:16 PM) dannon: Because, with the leading slash, it should be doing the same exact thing as it was before. (04:20:36 PM) ceberhard: dblank: that may be a better question for you. (04:21:22 PM) dblank: fwiw: I had to fix the stderr/stdout link generation recently due to the disabling of route memory https://bitbucket.org/galaxy/galaxy-central/commits/2cfc5c8223ef (04:21:24 PM) mrscribe: Title: galaxy / galaxy-central / commit / 2cfc5c8223ef Bitbucket (at bitbucket.org) (04:21:55 PM) dannon: dblank: Yep, that was expected, and why this went in after the -dist :) (04:22:06 PM) dannon: We're going to find things like that here and there. (04:23:23 PM) dblank: indeed, just mentioning it since disable it did, have an effect. the leading slash was used previously to disable it, if its not helping now, what does removing it do? (04:23:51 PM) dannon: Well, I think carl was saying it did still change things? (04:24:35 PM) ceberhard: Haven't tested the difference in a while. I removed it to get the new-style links to clean up a while back. (04:24:36 PM) dannon: In any event, previously that display application url_for, starting with /dataset, was intentionally disabling route memory to get /dataset/display_application/blahblah urls, right? (04:25:39 PM) dblank: urls used to not have the /dataset/ part, just start with /display_application (04:26:35 PM) dannon: So is get_display_url in DisplayApplication not what's generating these? It hasn't been touched since 6714. (04:27:08 PM) jmchilton1: I actually have ruled that line out as where the link is coming from, I cannot find where the link for my display application is being generated. (04:27:18 PM) ceberhard: Just checked: adding in the API the slash produces the verbose url. (04:27:35 PM) ceberhard: The api uses a different version of that call. (04:27:48 PM) ceberhard: It was added to get around the two transactions/mappers. (04:28:13 PM) ceberhard: Are you seeing the link in the history panel, John, or somewhere else? (04:28:44 PM) jmchilton1: The remote display application is getting the link starting with /dataset/..., my hunch is this use to not happen (04:28:50 PM) dannon: ceberhard: Where's the other version of url_for? (04:29:05 PM) ceberhard: line 401, base.controller (04:29:25 PM) dblank: the link in the history panel looks ok to me, but now when the user is actually sent to the external app, its using the verbose /dataset/display_app... link (04:29:43 PM) dblank: jmchilton1: true, did not used to happen (04:29:50 PM) jmchilton1: :) (04:30:06 PM) jmchilton1: Progress? (04:31:40 PM) jmchilton1: I have to go catch a bus, but thanks for all the help guys. (04:32:26 PM) dannon: Anytime. So when you say "now when the user is actually sent to the external app", where does that happen? (04:32:57 PM) dblank: we'll have to figure this out though, since a lot of displays will be broken with the 'verbose' form, since links won't be generated to look like filenames with extensions, etc (04:33:41 PM) dblank: dannon: you click on the link in the history panel, then it goes off and generates the additional info, like viewport, format conversions etc (04:34:25 PM) dblank: when the app is ready, the user is redirected to the final destination (04:34:39 PM) dblank: the link in the history is ok, but the links afterwards are incorrect. (04:34:40 PM) dannon: Got it. So the url generated for the redirect is broken. (04:35:20 PM) dannon: ceberhard: You should be able to get rid of the extra url_for now, I think. It should do to exact same thing as routes.util.url_for used elsewhere. (04:35:54 PM) ceberhard: Yep. Understood. (04:36:21 PM) dblank: not just in the redirect, but also in the content that is generated (04:37:22 PM) dblank: so e.g you have http://localhost:8080/display_application/a43779105051f6e9/ucsc_vcf/main/ which has http://localhost:8080/display_application/a43779105051f6e9/ucsc_vcf/main/Non..., which creates a track with content: track type="vcfTabix" name="BAM Coverage on data 1" bigDataUrl="http://localhost:8080/dataset/display_application?app_action=data&user_id=616066710fa1d0ac&app_name=ucsc_vcf&action_param=galaxy_a43779105051f6e9.vcf.gz&link_name= (04:38:21 PM) guerler left the room (quit: Ping timeout: 245 seconds). (04:38:22 PM) dblank: the /track link that is provided to ucsc via redirect in this case is being sent 'verbose', and the content inside of the track is also verbose (04:40:17 PM) the_cull left the room (quit: Read error: Operation timed out). (04:40:48 PM) ceberhard: Removing the slash before 'dataset' in application.py, line 43 seems to provide the redirect url more cleanly. Can you confirm, dblank? (04:43:25 PM) dannon: Right, that allows it to match the defined route instead of :controller/:action, but how did this possibly work before, since that hasn't changed forever?