[OpenID] Canonical OpenID url form
Andrew Arnott
andrewarnott at gmail.com
Thu Jul 10 20:02:10 UTC 2008
If XRIs allow unicode characters and URIs do not, then prefixing
http://xri.net/ in front of an XRI does *not* guarantee a proper URI. It
merely makes it look like one. But if foreign characters exist in the XRI,
they must be properly % encoded for the result to be a proper URI.
On Thu, Jul 10, 2008 at 11:31 AM, Drummond Reed <drummond.reed at cordance.net>
wrote:
> Martin's right, Peter -- XRI is one option for Unicode. But you can also
> use
> an internationalized domain name
> (http://en.wikipedia.org/wiki/Internationalized_domain_name) in a regular
> URL. It uses Punycode (http://en.wikipedia.org/wiki/Punycode).
>
> You can also turn an XRI into an URL by adding an XRI proxy resolver prefix
> (such as http://xri.net/ -- see my sig below for an example). In that
> approach the proxy resolver prefix has nothing to do with the XRI itself,
> so
> there's no need to internationalize the domain name.
>
> =Drummond
> http://xri.net/=drummond.reed
>
>
> > -----Original Message-----
> > From: Peter Williams [mailto:pwilliams at rapattoni.com]
> > Sent: Wednesday, July 09, 2008 11:40 PM
> > To: Drummond Reed; 'Johnny Bufu'; 'Andrew Arnott'
> > Cc: 'OpenID List'
> > Subject: RE: [OpenID] Canonical OpenID url form
> >
> > So the short form of the story is: use xri for unicode (and then
> transform
> > the xri into an https hxri).
> >
> > Its been a month since I studied xri (and thus have forgotten 80 percent
> > of it). I recall there was a syntax to identify the address of the
> initial
> > resolver. Is there a way tha this became the domain name componnt of the
> > hxri
> >
> > -----Original Message-----
> > From: Drummond Reed <drummond.reed at cordance.net>
> > Sent: Wednesday, July 09, 2008 11:34 PM
> > To: 'Johnny Bufu' <johnny.bufu at gmail.com>; 'Andrew Arnott'
> > <andrewarnott at gmail.com>
> > Cc: 'OpenID List' <general at openid.net>
> > Subject: Re: [OpenID] Canonical OpenID url form
> >
> >
> > Also for the record, XRIs (which use the IRI character set) have a very
> > simple defined transformation into IRIs. Thus when an XRI needs to be
> sent
> > over-the-wire in an HTTP(S) URI, it must first be transformed into an
> IRI,
> > then you follow the IRI spec (RFC 3987) to transform into a URI as Johnny
> > describes below. Reverse the process to display back to the user.
> >
> > See
> >
> http://docs.oasis-open.org/xri/xri-syntax/2.0/specs/cs01/xri-syntax-V2.0-
> > cs.
> > html for all the gory details (and they are gory - Unicode is hard).
> >
> > =Drummond
> >
> > > -----Original Message-----
> > > From: general-bounces at openid.net [mailto:general-bounces at openid.net]
> On
> > > Behalf Of Johnny Bufu
> > > Sent: Wednesday, July 09, 2008 10:52 PM
> > > To: Andrew Arnott
> > > Cc: OpenID List
> > > Subject: Re: [OpenID] Canonical OpenID url form
> > >
> > > For the record, since this continued in an offline thread:
> > >
> > > The issue is around the User-Supplied Identifiers. OpenID defines them
> > > as a type of Identifiers, which in turn defined as HTTP(S) URI or XRIs.
> > > HTTP(S) URI do not allow non-ASCII characters.
> > >
> > > So, out of scope of OpenID, parties accepting IRIs (other than XRIs)
> > > should follow the respective authoritative recommendations (i.e.
> > > RFC3987) before presenting such strings to the OpenID layer as HTTP
> > > URIs, and convert them back to IRI form later on when they need to be
> > > displayed back to the users.
> > >
> > > Johnny
> > >
> > > On 08/07/08 10:32 PM, Andrew Arnott wrote:
> > > > Thanks, Johnny. I've had some conversations with a few other people
> > > > who draw the opposite conclusion and believe that the %AB%CD notation
> > > > is the canonical form.
> > > >
> > > > You make a good point about having to unescape the characters from
> > > > the URI just above the transport layer, but I believe you're applying
> > > > section 4.1 to the URL when it should only be applied to the
> > > > key/value pairs. The OpenID ClaimedIdentifier, which by the spec is
> > > > the last URL to respond without an HTTP redirect, cannot be in
> > > > unicode by the URI specification because unicode characters are not
> > > > allowed, whether that is UTF8 or UTF16.
> > > >
> > > > Name/value pairs passed as part of a querystring may (and as the
> > > > section you quote requires) be encoded as UTF-8, but they are
> > > > subsequently URI encoded as %AB%CD hex characters (thus doubly
> > > > encoded) so they are actually no longer UTF-8 at the transport layer.
> > > > Since the OpenID URL, around which all the identity of OpenID is
> > > > focused (omiting XRIs which don't suffer from this problem) /is/ at
> > > > the transport layer of the way the security requirements force the
> > > > claimed identifier to be discovered, is all about the transport
> > > > layer, I believe it would be a mistake to add semantics on top of
> > > > that and call it canonical.
> > > >
> > > > What I also realized from some other conversations is that this
> > > > doesn't really matter. As long as an OP or RP is consistent within
> > > > itself in storing and comparing Claimed Identifiers, whether it
> > > > stores and compares %AB%CD or the unicode equivalent character won't
> > > > matter to anyone, since on the protocol/wire level it is always
> > > > %AB%CD. However, I think unescaping the URL and getting the original
> > > > unicode characters back is very useful and should be done for
> > > > purposes of displaying to the user.
> > > >
> > > > I think for the security and guaranteed identity of the protocol,
> > > > there is a meaningful side to this though. It has not got to do with
> > > > how the claimed identifier is stored, but rather how a unicode
> > > > string is escaped for URI transport. A given unicode string may be
> > > > represented by more than just one series of bytes. Unicode
> > > > characters exist that in UTF-8 or UTF-16 have multiple byte sequences
> > > > /for the same character/. Therefore someone who is typing in their
> > > > OpenID url to a site using one method during one visit, and then
> > > > types it in to the same site using a different method on a subsequent
> > > > visit, will only be identified by the RP as the same visitor if
> > > > OpenID requires that the RP transforms whatever unicode string is
> > > > given by the user to the canonical byte form as defined by the
> > > > unicode standard before transit. For example, the letter 'Á' can be
> > > > encoded as a single character or using composition by adding an
> > > > accent to the A character. Both are legal, but the unicode standard
> > > > defines one as canonical (I think). But if a string containing this
> > > > character is not canonicalized first, then although the character is
> > > > equivalent to the user and to unicode, the encoded %AB%CD string will
> > > > be different, resulting in security problems for OpenID because
> > > > people could overload a single Identifier just by using different
> > > > encodings at an OP, or fail to log into an RP depending on how they
> > > > craft their string. By the way, I say 'unicode' in the strict sense,
> > > > applying to UTF-8, UTF-16, etc. Unicode is commonly used to refer to
> > > > just UTF-16, but this problem applies to all unicode character sizes.
> > > >
> > > >
> > > >
> > > >
> > > > So I think OpenID should be more explicit about its unicode support
> > > > for Identifiers, including mandating a canonical Unicode form.
> > > >
> > > > On Tue, Jul 8, 2008 at 9:41 PM, Johnny Bufu <johnny.bufu at gmail.com
> > > > <mailto:johnny.bufu at gmail.com>> wrote:
> > > >
> > > >
> > > > On 08/07/08 03:01 PM, Andrew Arnott wrote:
> > > >
> > > > What is the canonical form of an OpenID URL? One with the %AB%CD hex
> > > > encoding for unicode chars in the URL or with the actual unicode
> > > > chars? For the purposes of displaying to the user and storing in the
> > > > RP's database.
> > > >
> > > > The spec doesn't seem to have anything to say on this.
> > > >
> > > >
> > > > I believe it does say:
> > > >
> > > > 4.1. Protocol Messages The OpenID Authentication protocol messages
> > > > are mappings of plain-text keys to plain-text values. The keys and
> > > > values permit the full Unicode character set (UCS). When the keys and
> > > > values need to be converted to/from bytes, they MUST be encoded
> > > > using UTF-8 [RFC3629].
> > > >
> > > > http://openid.net/specs/openid-authentication-2_0.html#anchor4
> > > >
> > > >
> > > > The reason I think it's not a simple automatic answer is the unicode
> > > > chars may be what the user typed in and what exists on the server,
> > > > but in transit, these characters are translated to %AB%CD in order to
> > > > be validly escaped URI strings.
> > > >
> > > >
> > > > The receiving party must decode them to the original form when they
> > > > are extracted from the transport layer.
> > > >
> > > >
> > > > So one could argue that the unicode characters are never part of the
> > > > protocol
> > > >
> > > >
> > > > One would then be ignoring the parts of the protocol that do not deal
> > > > with the transport layer directly.
> > > >
> > > >
> > > > Johnny
> > > >
> > > >
> > > > !DSPAM:139,48744d86221113907413095!
> > > _______________________________________________
> > > general mailing list
> > > general at openid.net
> > > http://openid.net/mailman/listinfo/general
> >
> > _______________________________________________
> > general mailing list
> > general at openid.net
> > http://openid.net/mailman/listinfo/general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openid.net/pipermail/openid-general/attachments/20080710/9eda920b/attachment-0002.htm>
More information about the general
mailing list