[OpenID] Demystifying 'black-box' security

SitG Admin sysadmin at shadowsinthegarden.com
Fri Apr 4 10:56:06 UTC 2008


Proposal: add to Consumer libraries an option for keeping track of 
vital variables (submitted, discovered, and generated), then storing 
these with the same method available for associations.
Benefits: easing development for unexpected-use cases, making the 
authentication process transparent regardless of anyone's 
understanding of the code used in that library

I've been experimenting with several Consumer libraries, and I 
sort-of understand each of them, to varying degrees. Modifying them 
to log what's going on (and satisfy myself regarding the security of 
OpenID as they implement it) has ranged from difficult to impossible. 
Rather than the rapid adoption I had initially anticipated, there has 
been frustration as I struggle to further my grasp of what's going on 
inside those libraries to a point where I can be confident that it IS 
secure in itself and that it can integrate with (instead of breaking) 
my other intended security measures.

I've tried to set up testing environments - and, ultimately*, failed 
because I didn't understand the libraries well enough to do so. I've 
used jkg.in/openid to supplement my efforts, and spent hours in a 
terminal bouncing between my (own) Consumer and the (jkg.in/openid) 
Provider with telnet, editing select areas of Location headers along 
the way before sending them to the next location as a GET string. (I 
later deemed these tests inconclusive because the jkg.in/openid 
Provider was TOO permissive - but I lacked the skill to set up a 
multi-user Provider that could be made to authenticate me 
automatically if the openid parameters were cryptographically valid.) 
I've tried to save the information provided by a user initially (when 
the OpenID URI is entered into my login field), but then I have no 
way of reliably associating that with a user who has successfully 
logged-in once the library returns control to MY code.

*I'm sure that, on some more immediate level, there were any number 
of other reasons why I failed. But my inability to figure out what 
those reasons were, or devise a way to success, can ultimately be 
attributed to an inadequate understanding of the code.

I could modify the libraries to do that - but, no. I could TRY, sure, 
but I'm probably not good enough to actually pull it off. And all 
this is a barrier to adoption. Surmountable, sure - in time, but 
while I'm determined enough to implement OpenID (as a Relying Party) 
that I'm willing to spend my spare time in pursuit of this goal, I've 
been putting off other things (like studying) for long enough that 
I'm beginning to run out of time. Others in my position - especially 
those with less skill, time, or motivation - might have already given 
up.

I want to promote OpenID. My original plan for doing this was to 
require visitors (to my personal site) to log in with an OpenID 
before viewing any content, and then use an Access Control List to 
determine which pages they could see. People that I knew (as 
associated with their OpenID's) would be able to view sensitive 
information about me that they already knew - name, gender, location, 
etcetera - without requiring further authentication. All others would 
be denied that level of access. But, if OpenID alone would be proving 
that a visitor was indeed someone I knew, I had to make sure that it 
couldn't generate a false positive.

I also wanted to be able to pre-authorize someone *before* they had 
an OpenID, knowing just their domain, so I could tell them "Go get 
one, it'll let you read my articles." and then add their ACL while 
they were still working on it. No further communication would be 
required - the moment they got OpenID working, they could log in, 
without having to get in touch with me again and say "Hey, this is my 
OpenID Identity, please let me in will ya?" (an additional step that, 
even if it only took a few moments, would be a discouraging 
interruption to the ease of using OpenID; and, if I was difficult to 
get ahold of or we'd met at a conference and exchanged no other 
contact information, might break it indefinitely). Now, you're 
probably thinking "Not a problem - even if they only put in a few 
headers and used delegation for a 3rd-party Provider, you can still 
recognize their claimed_id.", and for most cases, that's correct. 
(The beta Zend_OpenId_Consumer has a bug that forgets claimed_id with 
version-1.1-using Providers. I'm not using the Zend libraries 
anymore.) But there's one case I'm still worried about:

Let's say that my new friend (let's call him Tom, because it's a 
short name and I like Tom's Hardware) is a fast learner. He quickly 
latches on to this OpenID stuff and begins thinking, "Why don't I 
provide this as a service to my customers, too?". So, in addition to 
his personal bio on the business server being his OpenID Identity, 
some of his customers have their Profile pages extended with OpenID 
tags. That's okay with me, because Tom is a thoughtful kind of guy 
and notified me of his plans in advance so I could associate his ACL 
with his bio page instead of just any result from his page (he also 
didn't want his customers to be reading some of our private 
communications, which I've helpfully archived for him on my site so 
he can recover more quickly and easily from data loss, as well as 
check in to see if anyone is blocking our E-mails.) Next up, Tom is 
reading some feedback from one of his customers, who complains that 
sites accepting OpenID's from Tom's server shouldn't need to know 
their Profile when in many cases it's enough just to know that this 
is one of Tom's customers, and asks if Tom can do what Yahoo! is 
doing - offer an anonymous "me.tom.com" OpenID with Directed 
Identity, then have Tom's server assign a random identity. Now, when 
Tom reads up on what Yahoo! is doing, he thinks this is a nifty idea 
- so he puts it in place right away.

But poor Tom had to leave on a business trip the next morning, and 
didn't have time to thoroughly test his implementation of this 
feature. He was smart enough to realize that keeping me.tom.com from 
assigning an *existing* Identity was necessary, but also smart enough 
to be so confident that, when the code looked good in his development 
environment, he was *certain* it would work as intended in the 
production environment. So he got some sleep rather than run the code 
through some tests, and the next morning he added it to the server 
just before rushing out the door.

Unfortunately for Tom (and his customers), there was a typo in his 
code. (It was late at night, he was tired, he was squinting, one 
letter looked like another - who could blame him?) It didn't actually 
do what it was supposed to do. (A decent test would have prevented 
this, but Tom's lack of programming experience betrayed him here.)

Someone (maybe one of his competitors) notices the new feature, knows 
of Tom's absence (it was announced on his blog/bio, anyway), and runs 
nothing but authentication attempts (against one of her own 
Consumers) from a computer conveniently operated by a disgruntled 
employee who can play the role of scapegoat if anything goes wrong 
(read: "plausible deniability"). If she's caught, she has an excuse 
for "letting go" the employee "responsible"; if she doesn't find 
anything, she won't lose anything more than the few minutes of her 
time it took to set this up.

But then she finds something. It's a match. (She was logging in with 
her employee's credentials, but her Consumer recognizes her as 
another of Tom's customers. She obtained these credentials through 
workplace surveillance, which may explain why her employee has been 
so disgruntled recently.) And, happily, she begins to exploit this to 
try breaking into other sites, using the trust granted to those 
customers to gather information about them (which she can later use 
to lure them away from Tom) and learn more about who Tom is 
associated with that might have *another* OpenID Consumer, so she can 
spread her web of access further.

Eventually someone may find out what's going on. But, at that point, 
we still won't be happy. Because, if our Consumer implementations 
don't let us discriminate between users who started their login with 
me.tom.com and those who started with a specific Profile page, we 
have to choose between disallowing ALL logins from tom.com (until Tom 
can fix his new Provider or we can rig a workaround), which would 
inconvenience his customers and possibly disrupt our own business, 
and leaving things as they are (with attacks ongoing) until the 
problem can be fixed.

Now, the fix is easy enough; we could check when the user first 
submits their URI, and if it's me.tom.com, we print an error message 
saying "Anonymous logins from Tom's customers are not currently 
permitted, sorry for the inconvenience." instead of proceeding to the 
Consumer code. But if we don't have immediate access to our server, 
or the person who does is out sick (or maybe they went to the same 
conference as Tom did), or we aren't skilled enough coders to readily 
do this - even more time passes, while we frantically try to figure 
out what we can do about it.

This (long example) can be prevented, in advance, merely by keeping 
track of the OpenID Identity originally entered, and letting the code 
outside of a Consumer decide what, if anything, to do with it. But 
why preserve just that?

It's all we need? But we *don't* need it - this was just a 
hypothetical (and highly improbable) example. I do have a real 
situation where not keeping track of claimed_id on the Consumer side 
(and I've tested this with the PHP OpenID library from 
openidenabled.com and some trickery to fake a Provider that does not 
meet spec) may make it more difficult for my users to log in, but 
let's not get bogged down in the details; I don't want to give the 
wrong impression here. This is why I've used an example that is 
hypothetical only; I want to demonstrate the *possibility* of 
something going wrong with an unanticipated situation, without 
offering a specific real case to focus on fixing. I think that, if we 
wait until a real need arises for any given piece of information that 
we already had, we take the risk that these real needs will manifest 
themselves in a damaging way, and, though there aren't that many 
pieces of information anyway, minimizing the odds of this still seems 
a good idea.

It might be more trouble than it's worth to implement, though. The 
cost should be mainly in extra storage space, per OpenID processed; 
the information being stored is nothing more than what the library 
just generated, anyway. The benefits resemble more of an investment 
than a reality, the potential to help out with future situations. But 
tangential projects might find it an enabler:

Imagine a program that accepted such a log file as input, and sent 
back to the user an explanation (in layman's terms) of exactly what 
each of these values were, and their usage in the protocol (at which 
place each of them were used, and in which way), all laid out in a 
linear fashion so programmers and end-users alike could follow the 
protocol flow. Imagine any number of such programs, ranging from the 
simple and barebones data (for programmers to look at, so they could 
reconstruct afterward what was going on "under the hood", so to 
speak), to some fancy formatting and paragraphs of explanation (for 
the end-user who wanted to follow along), to built-in hashing tools 
and auto-comparison (so that, after reading that the hashes needed to 
match up, the user could click a button and see the relevant pieces 
of information hashed, with the program displaying a warning if the 
calculated hash did not match up with the information recorded for 
that step of the process, and explaining the ways in which security 
was compromised as a result). Imagine the protocol enjoying this 
level of transparency regardless of the exact library being used, to 
inspection from those who might not understand programming at all - 
and wouldn't need to.

-Shade



More information about the general mailing list