[OpenID] XRI I18N (Was Re: Benefits of XRI i-names/i-numbers as OpenIDs)

Tan, William William.Tan at neustar.biz
Wed Feb 14 01:01:30 UTC 2007

Martin Atkins wrote:
> Drummond Reed wrote:
>> * Internationalization: i-name syntax is fully internationalized (uses the
>> full Unicode character range) right from the start, without the need for
>> complicated punycode (http://en.wikipedia.org/wiki/Punycode).
> "Complicated" punicode is just another unicode encoding scheme, like 
> UTF-8. Just as with UTF-8, most developers don't need to care much about 
> it as the lower-layer libraries handle these implementation details.
> Since you bring it up, could you explain to me how XRI handles the 
> problem of the human confusion between the two distinct identifiers 
> "=martin" and "=mаrtin"
The punycode algorithm was designed for maximum compression while 
retaining some readability for Latin domains, and is therefore 
complicated. I don't think anyone can argue that it was born out of the 
need for internationalizing a stubbornly 7-bit DNS.

Having 2 parallel versions of an identifier is not always as clear cut 
as was originally envisioned. The original intention was to use Punycode 
for network and storage, while input and display would use the Unicode 
counterpart. However, things get complicated during implementation 
because essentially you need to transport both copies of an identifier, 
so it is not really true that developers don't need to care much about it.

Drummond point is that XRI is built atop IRI, and therefore needs no 
intermediate transformation step (we did take the good parts from IDN 
(notably NFKC).

Regarding your point about confusion, your example is a classic one with 
mixing cyrillic and latin characters. The GRS has a two-fold defense 
1. Each registrable character has a canonical version, i.e. "é" maps to 
"e" and the tables are defined in http://inames.net/lang/ . So, when you 
register "=résumé", no one else but you can register "=resume".

Cyrillic characters are currently not open for registration yet, but 
when it does they will most likely be mapped to the Latin lookalike. In 
addition to that, any i-name proposed for registration needs to have all 
its characters fall into a single script (or set of scripts as in the 
Japanese case.) Also, I would like to first point out that any 
discussions related to the tables and mappings should probably be best 
done offline so as not to clog this mailing list.

2. After you have registered "=résumé", you (and only you) can register 
additional aliases such as "=resume" and tie it to the same i-number. 
This is also built into the GRS from day one. In the domain world, this 
mechanism would be called "bundling", which is of course again 
retrofitted into the traditional DNS registry.

XRI also possess a very desirable property that helps mitigate 
homographic attacks - little-endian; the most significant part of the 
name is in front. While the script tables and canonicalization are not 
baked into the XRI syntax specifications, because the GRS implements a 
very restrictive policy, the most significant segment of an i-name is at 
least secure. In the case of DNS, a subdomain can create havoc by simply 
doing: http://ebay.com------------------------------bad.com
Or worse, using the division sign: http:∕∕ebay.com∕whatever.com
Note the above slashes are U+2215 mathematical symbol "division slash".


More information about the general mailing list