DRAFT 11 -> FINAL?

Thu Jan 25 18:51:14 UTC 2007

Since your list is long, I'm only going to address things I have an 
answer to. I'll leave the rest to other people. :)

Claus Färber wrote:
> -----
> | 4.1.1.  Key-Value Form Encoding
> |
> | A message in Key-Value form is a sequence of lines. Each line begins
> | with a key, followed by a colon, and the value associated with the
> | key. The line is terminated by a single newline (UCS codepoint 10,
> | "\n"). A key or value MUST NOT contain a newline and a key also MUST
> | NOT contain a colon.
> 
> While it's ok for keys to have a limited alphabet, not allowing them in
> the value may be a problem for future extensions (say, if AX wants to
> transfer a multiline postal address).
> 
> I don't think it's a good idea to invent a new format here. What's wrong
> with URI percent-encoding, which has to be implemented by OpenID
> software anyway?
> 

As far as I'm aware the protocol does not allow for extensions to add to 
the messages sent in this encoding. Extensions are only allowed to take 
part in the messages passed by redirecting the UA, in urlencoded format. 
The OpenID protocol itself has no need for newlines or other fancy 
characters here.

I agree with you that inventing a new format here was an odd choice, but 
unfortunately it's already in use by deployed OpenID 1.1 implementations 
and so OpenID 2 implementations will have to support it anyway.

> 
> | The response body MUST be a Key-Value Form (Key-Value Form Encoding)
> | message with the following fields:
> ...
> | *error
> |     Value: A human-readable message indicating the cause of the error.
> 
> What about different languages?
> 
> This is actually tricky. The Accept-Language header from the user is not  
> available to the OP if it is contacted by the RP directly, so the OP  
> can't choose an appropriate language.

This "human-readable message" is really addressed at the developer of 
the non-compliant implementation, not the end-user. Therefore support 
for multiple languages was thought not to be that important. All current 
implementations use straightforward English messages.

The end-user should never see such a message since these error responses 
only arise when there is a bug in someone's implementation. (In theory, 
at least!)

> 
> | 7.3.3.  HTML-Based Discovery
> |
> | HTML-Based discovery MUST be supported by Relying Parties. HTML-Based
> | discovery is only usable for discovery of Claimed Identifiers. OP
> | Identifiers must be XRIs or URLs that support XRDS discovery.
> |
> | To use HTML-Based discovery, an HTML document MUST be available at the
> | URL of the Claimed Identifier. In the HEAD section of the document:
> 
> It seems that it has been unclear to some implementers what "HEAD
> section" means. This is not surprising as the correct term is "HEAD
> element" (HTML 4.01, section 3.2.1).

Agreed. This is just a simple language change.

> |    A <LINK> tag MUST be included with attributes "rel" set to
> |        "openid2.provider" and "href" set to an OP Endpoint URL
> |    A <LINK> tag MAY be included with attributes "rel" set to
> |       "openid2.local_id" and "href" set to the end user's OP-Local
> |       Identifier
> 
> These should read "LINK element", not "<LINK> tag".

While you're correct, I don't think it's that erroneous to say "<LINK> 
tag" here. Still, it can't hurt to correct it.

> | The protocol version when HTML discovery is performed is "http://
> | specs.openid.net/auth/2.0/signon".
> |
> | The host of the HTML document MAY be different from the end user's
> | OP's host.
> |
> | The "openid2.provider" and "openid2.local_id" URLs MUST NOT include
> | entities other than "&amp;", "&lt;", "&gt;", and "&quot;". Other
> | characters that would not be valid in the HTML document or that cannot
> | be represented in the document's character encoding MUST be escaped
> | using the percent-encoding (%xx) mechanism described in [RFC3986]
> | (Berners-Lee, T., "Uniform Resource Identifiers (URI): Generic
> | Syntax," .).
> 
> This is incompatible with HTML parsing rules.
> 
[snip]
> 
> I think the whole section is flawed and should be replaced with a small
> note as a warning:
> 
>   Note: If the "openid2.provider" and "openid2.local_id" contain an
>   ampersand (&), this character MUST be escaped as "&amp;" according to
>   HTML rules.

In practice, few implementations actually use an HTML parser to find 
these elements. These extra restrictions are present to facilitate 
regex-based parsing.

Since there is no regression if an implementation should choose to use a 
*real* HTML parser, I don't see the harm here. It's imposing extra 
restrictions only on people who want to use OpenID HTML discovery; the 
rest of the document uses the normal HTML rules, though there is one 
exception which I'll describe below.

> 
> BTW, good HTML parsers are mandatory to protect against HTML injection
> attacks. An ad-hoc parser might simply fall for something like that:
> 
>   <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
>   <link rel=openid2.provider href='http://openid.example.com/~user'>
>   <title>Nice test</title>
>   <form action="doit">
>   <p>Send me your comment:
>   <input type=text name=comment value='<html><head><link rel=openid2.provider href="http://bogous.example.net"></head>'>
>   <input type=submit>
>   </form>
> 
> Yes, this is valid HTML and yes, a HTML "sanitizer" that intends to do
> the shortest possible representation would produce something like this.

The regex-based parsers employed by existing implementations require 
explicit <head> start and end tags. I agree that this is not ideal, but 
it's hardly an onerous requirement on document authors.

This is mostly an ideological argument founded on whether we're allowed 
to impose additional restrictions on HTML documents that are making use 
of OpenID discovery. There is certainly no *practical* reason why this 
shouldn't be done, assuming that the restrictions are sufficient to 
prevent the above attack.

(The spec is currently lacking specifics on how to do this safely, by 
the way. While I'm in two minds about the spec describing particular 
implementations, it *does* need some normative parsing requirements in 
sufficient detail that all complying implementations would not fall for 
the above attack.)

> 
> | 11.1.  Verifying the Return URL
> |
> | To verify that the "openid.return_to" URL matches the URL that is processing this assertion:
> |
> |   * The URL scheme, authority, and path MUST be the same between the two URLs.
> |   * Any query parameters that are present in the "openid.return_to"
> |     URL MUST also be present with the same values in the URL of the
> |     HTTP request the RP received.
> 
> http://example.com/do?dosomethingharmless=yes
> matches
> http://example.com/do?dosomethingharmless=yes&deleteverything=YES
> 

While this is true,
  * Good sites will not have a GET request performing an action
  * If an attacker has something to gain from sending the browser to 
that latter URL, you don't need to go through all the overhead of an 
OpenID request to send the browser there.