[OpenID] UTF-8 processing

Mark Fowler mark at twoshortplanks.com
Sat Apr 14 13:57:39 UTC 2007


On 11 Apr 2007, at 10:29, Tan, William wrote:

> May I suggest [section 4 of the spec is altered to] clarify it so  
> that it reads:
>
> "The keys and values permit the full Unicode character set (UCS). When
> the keys and values need to be converted to/from bytes, they MUST  
> first
> be represented in the UTF-8 [RFC3629] and then percent encoded  
> [RFC3986]."

> Thoughts?

If I understand the spec right from a simple glance, this section is  
describing how a string should be converted into a byte sequence.   
This byte sequence may then be later converted into a digest.  The  
point in saying it must be UTF-8 is so we can create the same byte  
sequence on both client and server, right?

If this is the case I see another problem;  There's more than one way  
to represent a character with UTF-8.  Should the spec be adjusted to  
also specify the type of UTF-8 normalisation that must be used?  I'd  
recommend normalised-form-C (NFC) because it's a) most common (In my  
limited experience) and b) the more compact option and we're sending  
this over the wire.

Similarly, should we be also specifying that there either isn't a  
BOM, or also stating one is mandatory?

Mark.






More information about the general mailing list