[OpenID] UTF-8 processing
Mark Fowler
mark at twoshortplanks.com
Sat Apr 14 13:57:39 UTC 2007
On 11 Apr 2007, at 10:29, Tan, William wrote:
> May I suggest [section 4 of the spec is altered to] clarify it so
> that it reads:
>
> "The keys and values permit the full Unicode character set (UCS). When
> the keys and values need to be converted to/from bytes, they MUST
> first
> be represented in the UTF-8 [RFC3629] and then percent encoded
> [RFC3986]."
> Thoughts?
If I understand the spec right from a simple glance, this section is
describing how a string should be converted into a byte sequence.
This byte sequence may then be later converted into a digest. The
point in saying it must be UTF-8 is so we can create the same byte
sequence on both client and server, right?
If this is the case I see another problem; There's more than one way
to represent a character with UTF-8. Should the spec be adjusted to
also specify the type of UTF-8 normalisation that must be used? I'd
recommend normalised-form-C (NFC) because it's a) most common (In my
limited experience) and b) the more compact option and we're sending
this over the wire.
Similarly, should we be also specifying that there either isn't a
BOM, or also stating one is mandatory?
Mark.
More information about the general
mailing list