[OpenID] XRI I18N (Was Re: Benefits of XRI i-names/i-numbers as OpenIDs)
Tan, William
William.Tan at neustar.biz
Wed Feb 14 01:01:30 UTC 2007
Martin Atkins wrote:
> Drummond Reed wrote:
>
>> * Internationalization: i-name syntax is fully internationalized (uses the
>> full Unicode character range) right from the start, without the need for
>> complicated punycode (http://en.wikipedia.org/wiki/Punycode).
>>
>
> "Complicated" punicode is just another unicode encoding scheme, like
> UTF-8. Just as with UTF-8, most developers don't need to care much about
> it as the lower-layer libraries handle these implementation details.
>
> Since you bring it up, could you explain to me how XRI handles the
> problem of the human confusion between the two distinct identifiers
> "=martin" and "=mаrtin"
The punycode algorithm was designed for maximum compression while
retaining some readability for Latin domains, and is therefore
complicated. I don't think anyone can argue that it was born out of the
need for internationalizing a stubbornly 7-bit DNS.
Having 2 parallel versions of an identifier is not always as clear cut
as was originally envisioned. The original intention was to use Punycode
for network and storage, while input and display would use the Unicode
counterpart. However, things get complicated during implementation
because essentially you need to transport both copies of an identifier,
so it is not really true that developers don't need to care much about it.
Drummond point is that XRI is built atop IRI, and therefore needs no
intermediate transformation step (we did take the good parts from IDN
(notably NFKC).
Regarding your point about confusion, your example is a classic one with
mixing cyrillic and latin characters. The GRS has a two-fold defense
feature:
1. Each registrable character has a canonical version, i.e. "é" maps to
"e" and the tables are defined in http://inames.net/lang/ . So, when you
register "=résumé", no one else but you can register "=resume".
Cyrillic characters are currently not open for registration yet, but
when it does they will most likely be mapped to the Latin lookalike. In
addition to that, any i-name proposed for registration needs to have all
its characters fall into a single script (or set of scripts as in the
Japanese case.) Also, I would like to first point out that any
discussions related to the tables and mappings should probably be best
done offline so as not to clog this mailing list.
2. After you have registered "=résumé", you (and only you) can register
additional aliases such as "=resume" and tie it to the same i-number.
This is also built into the GRS from day one. In the domain world, this
mechanism would be called "bundling", which is of course again
retrofitted into the traditional DNS registry.
XRI also possess a very desirable property that helps mitigate
homographic attacks - little-endian; the most significant part of the
name is in front. While the script tables and canonicalization are not
baked into the XRI syntax specifications, because the GRS implements a
very restrictive policy, the most significant segment of an i-name is at
least secure. In the case of DNS, a subdomain can create havoc by simply
doing: http://ebay.com------------------------------bad.com
Or worse, using the division sign: http:∕∕ebay.com∕whatever.com
Note the above slashes are U+2215 mathematical symbol "division slash".
=wil
More information about the general
mailing list