[Openid-specs-ab] unicode host names, issuer, and URLs in the discovery document
n-sakimura
n-sakimura at nri.co.jp
Mon Apr 2 03:53:54 UTC 2018
During the OpenID Connect standardization, many of us thought that allowing Unicode characters in the URI causes too much troubles like administrator phisihing through look-alike characters. So, for the authority section at least, I would opt to ASCII strings but that is just a practice.
Spec-wise, only the provision that OpenID Connect has on Unicode is this:
14. String Operations
Processing some OpenID Connect messages requires comparing values in the messages to known values. For example, the Claim Names returned by the UserInfo Endpoint might be compared to specific Claim Names such as sub. Comparing Unicode strings, however, has significant security implications.
Therefore, comparisons between JSON strings and other Unicode strings MUST be performed as specified below:
1. Remove any JSON applied escaping to produce an array of Unicode code points.
2. Unicode Normalization [USA15]<http://openid.net/specs/openid-connect-core-1_0.html#USA15> MUST NOT be applied at any point to either the JSON string or to the string it is to be compared against.
3. Comparisons between the two strings MUST be performed as a Unicode code point to code point equality comparison.
In several places, this specification uses space delimited lists of strings. In all such cases, a single ASCII space character (0x20) MUST be used as the delimiter.
Unfortunately, it does not say anything about the encoding of the document, but per the RFC7159, the JSON document not confined within a closed environment, MUST be represented in UTF-8. (Note: it was not until December 2017! We need to update the reference in the Discovery spec.)
On the other hand, RFC3986 states:
The reg-name syntax allows percent-encoded octets in order to
represent non-ASCII registered names in a uniform way that is
independent of the underlying name resolution technology. Non-ASCII
characters must first be encoded according to UTF-8 [STD63<https://tools.ietf.org/html/rfc3986#ref-STD63>], and then
each octet of the corresponding UTF-8 sequence must be percent-
encoded to be represented as URI characters. URI producing
applications must not use percent-encoding in host unless it is used
to represent a UTF-8 character sequence. When a non-ASCII registered
name represents an internationalized domain name intended for
resolution via the DNS, the name must be transformed to the IDNA
encoding [RFC3490<https://tools.ietf.org/html/rfc3490>] prior to name lookup. URI producers should
provide these registered names in the IDNA encoding, rather than a
percent-encoding, if they wish to maximize interoperability with
legacy URI resolvers.
This makes it clear that URL can actually contain UTF-8 encoded characters.
So, https://müsik.example.com/ is a valid URL. The requirement is that the user-agent must transform the hostname to punnycode before submitting to the name resolver.
So the RFC 3986 compliant client must be able to cope with UTF-8 URI as long as the encoding is clear from the underlying context.
My personal conclusion is then:
1) Since discovery document is UTF-8, it should use UTF-8 encoded authority section.
2) Since JWT header and body is JSON, it MUST be UTF-8.
3) The client library SHOULD transform the UTF-8 authority section to punnycode before submitting to the DNS resolver. A client MUST make sure that the library that it is using does so unless it is using a IDN enabled modern DNS resolver.
Punnycoded string is extremely user-unfriendly for Asian and other characters. I would advocate for using UTF-8 in the JSON document and tell the client developer of the 3) above if they get an error with it, but YMMV.
Best,
Nat
From: Openid-specs-ab [mailto:openid-specs-ab-bounces at lists.openid.net] On Behalf Of Brock Allen via Openid-specs-ab
Sent: Monday, April 02, 2018 9:57 AM
To: Mike via Openid-specs-ab <openid-specs-ab at lists.openid.net>
Subject: [Openid-specs-ab] unicode host names, issuer, and URLs in the discovery document
A question came up recently in our implementation of IdentityServer around unicode host names and punycode encoding of those host names. I looked thru the discovery spec and couldn't parse it well enough to know the answer to my question, so I thought I'd ask here.
Should the issuer in the discovery document be punycode for host names with unicode characters? The issuer is a URI but, AFAICT, the URI spec says that encoding is context-dependent. So in URLs unicode host names need to be punycode, but in a JSON document (either in discovery or a JWT) they don't seem like they need to be.
Should the URLs in the document document (e.g. authorize and token endpoints) be punycode for host names with unicode characters? From what I've seen client/RP libraries don't do well with non-punycode URLs from discovery (meaning they don't encode them before trying to use them). But often pen-testers dislike the URLs not matching the original authority URL. Maybe this last point is pedantic.
It'd be best if there were simply a directive in the spec that simply tells me which way to do it, but in the absence of that, any insight would be appreciated.
Thanks.
-Brock
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openid.net/pipermail/openid-specs-ab/attachments/20180402/2adca211/attachment.html>
More information about the Openid-specs-ab
mailing list