[OpenID] Canonical OpenID url form

Johnny Bufu johnny.bufu at gmail.com
Wed Jul 9 04:41:27 UTC 2008


On 08/07/08 03:01 PM, Andrew Arnott wrote:
> What is the canonical form of an OpenID URL? One with the %AB%CD hex 
> encoding for unicode chars in the URL or with the actual unicode chars? 
> For the purposes of displaying to the user and storing in the RP's database.
> 
> The spec doesn't seem to have anything to say on this.  

I believe it does say:

4.1.  Protocol Messages
The OpenID Authentication protocol messages are mappings of plain-text 
keys to plain-text values. The keys and values permit the full Unicode 
character set (UCS). When the keys and values need to be converted 
to/from bytes, they MUST be encoded using UTF-8 [RFC3629].

http://openid.net/specs/openid-authentication-2_0.html#anchor4

> The reason I 
> think it's not a simple automatic answer is the unicode chars may be 
> what the user typed in and what exists on the server, but in transit, 
> these characters are translated to %AB%CD in order to be validly escaped 
> URI strings.  

The receiving party must decode them to the original form when they are 
extracted from the transport layer.

> So one could argue that the unicode characters are never 
> part of the protocol 

One would then be ignoring the parts of the protocol that do not deal 
with the transport layer directly.


Johnny




More information about the general mailing list