Re-defining the Key-Value format

Mon May 28 22:58:19 UTC 2007

Johnny Bufu schrieb:
> On 28-May-07, at 5:55 AM, Claus Färber wrote:
>> Johnny Bufu schrieb:
>>> So I've rewritten the encoding section, such that:
>> This means that '%' characters need to be encoded up to three times:
> I'm not sure I follow your reasoning all the way; please see my  
> comments below and point where I'm wrong.
> 
>> For example:
>>
>> User name: 100%pure
>>
>> Embedded in an URI that is the value of the attribute:
>>    http://example.com/foo/100%25pure
> 
> This encoding happens outside of the OpenID / AX protocols.

Yes. It's just for illustration. But yes, I counted that as the first 
encoding. However, two of the three encodings happen in AX and OpenID.

Further, I should have mentioned one more step here:

Encoded as an AX value:
     openid.ax.foo.uri:http://example.com/foo/100%2525pure

>> Encoded for AX using Key-Value Form Encoding  (OID 2, 4.1.1.)
>>    openid.ax.foo.uri:http://example.com/foo/100%2525pure
> 
> AX has nothing to do directly with key-value encoding. I see no  
> reference to percent-encoding from OpenID2's section 4.1.1.

> But yes, using the AX 3.3.1 Default Encoding of a String Value [1],  
> if user_name=100%pure the field in an key-value representation would be:
> 
> 	openid.ax.foo.value=100%25pure

This looks wrong. In Key-Value Form, it would be:

         ax.foo.value:100%25pure

(A colon, no "openid." prefix.)

In HTTP Encoding, it would be:

         openid.foo.value=100%2525pure

(First encoding from AX, second encoding from HTTP Encoding.)

>> Encoded for AX using HTTP Encoding (OID 2, 4.1.2.)
>>    openid.ax.foo.uri=http%3A//example.com/foo/100%2525pure

I got this wrong, it should be:
     openid.ax.foo.uri=http%3A//example.com/foo/100%252525pure

> Yes, there would be a double-encoding of the % char, one done by AX  
> 3.3.1, and another x-www-form encoding as required by OpenID 4.1.2  
> for indirect messages.

(plus the one by URI encoding.)

>> I don't think it's a good idea to introduce a solution to the "\n"
>> problem in AX only. It should be part of the base spec (OpenId 2
>> Authentication).
> 
> What do you see as pros / cons for each proposed solution?

AX is not the only OpenID extension that might need to encode "\n" 
characters.

If other specifications need to encode "\n" characters, it is easier to 
write such specifications if the base specification (OpenID 2.0 
Authentication) provides the encoding. It is also less likely that 
writers of such specifications invent their own ad-hoc encoding (or miss 
the problem at all).

The same is true for binary data: If the OpenID 2.0 specification 
RECOMMENDs base64, it's less likely that authors of extension specs 
invent their own encoding (which might be incompatible with software 
that expects UTF-8 and/or produces larger messages in HTTP Encoding.)

>> What about changing section 4.1.1. from:
>>
>>          A message in Key-Value form is a sequence of lines.  Each
>>          line begins with a key, followed by a colon, and the value
>>          associated with the key.  The line is terminated by a
>>          single newline (UCS codepoint 10, "\n"). A key or value
>>          MUST NOT contain a newline and a key also MUST NOT contain
>>          a colon.
>>
>> to (wording adapted from RFC 2822):
>>
>> 	A message in Key-Value form consists of fields composed of
>>          a key, followed by a colon (":"), followed by a value, and
>>          terminated by a single LF (UCS codepoint 10, "\n").
>>
>>          The key MUST be composed of printable US-ASCII characters  
>> except
>>          ":" (i.e. characters that have values between 33 and 57, or
>>          between 59 and 126, inclusive). The key MUST NOT start with
>>          a '*' (codepoint 32).
>>
>>          The value MUST be composed of a sequence of characters  
>> encoded
>>          as UTF-8. If an extension to this specification allows values
>>          that contain LF (UCS codepoint 10, "\n") characters, these LF
>>          (UCS codepoint 10, "\n") characters MUST be encoded as a
>>          sequence of LF, '*', ':' (UCS codepoints 10, 42, 32,   
>> "\n*:").
>>
>>     [Unlike the suggested %-encoding, this encoding is compatible with
>>     the current spec as long as LF characters are not actually allowed
>>     within the value.
> 
> What makes the proposed percent-encoding incompatible with the  
> current OpenID spec?

You can't use it as an encoding for _all_ Key-Value-Form messages, 
including those already specified in the base specification, as it 
encodes the '%' character differently:
   <openid.return_to=http://example.com/f%E4rber>
vs.
   <openid.ax.foo.return_to=http://example.com/f%25E4rber>.

If you want to change the encoding in the base specification (which I 
want to do), it better be identical for all characters except LF.

>>     It's similar to the RFC 2822 folding mechanism but folding is only
>>     allowed (and mandated) where a LF is to be encoded. Further, the
>>     continuation line is compatible with the key-value format,  
>> using '*'
>>     as a pseudo key value.]
>>
>>          If an extension to this specification needs to allows binary
>>          data in values, i.e. if it allows arbitrary bytes not to be
>>          interpreted as UTF-8 characters, it MAY use Base64  
>> [<reference>]
>>          encoding for the specification of the format of that value.
> 
> I would be (mildly) ok with dealing with newline escaping in the core  
> if others agree, but:
> - it does add some extra stuff, which some may not like / approve

Otherwise, the extra stuff is still there but duplicated in every 
extension that needs it. No good.

> - it would add another item on the 'compatibility list', and another  
> thing that OpenID 1/2 implementations would need to deal with twice

It would be an addition to the compatibility list, yes. But it would not 
require different handling for OpenID 1.x and 2.0:
Unless there's a LF character (which can't happen in OpenID 1.x), the 
Key-Value Form message will be absolutely identical.

> - not sure what would be the net advantage of having it there (aside  
> from having consistency across all extensions).

Well, isn't that enough?

>>     [Note: Base64, is quite efficient when it comes to encoding the
>>     message in HTTP Encoding (OID 2, 4.1.2.). Unencoded bytes would  
>> have
>>     to use the %-encoding, rougly doubling the size. Unencoded  
>> bytes also
>>     create problems if implementations think they should be UTF-8,  
>> e.g.
>>     if perl strings are used.]
>>
>>> - base64 must be used for encoding binary data, and defined
>>>    an additional field for this:
>>>    	openid.ax.encoding.<alias>=base64
>> I think it's much simpler if the specification of the field value  
>> format just says UTF-8 or Base64
> 
> The receiving party would need to distinguish between the two  
> somehow, no? So a flag of some kind would need to be passed through.

No, the _spec_ should IMO say what encoding the value is in.
There should be no flag (just as there's no flag that says the value is 
an URI.)

The specification should to say what the string means and what 
encoding/format it is in, for example:

- an integer (encoded with digits U+0030..0+0039)
- human-readable text
- a URI
- a PNG image (encoded as Base64)
- ...

There's no need for a flag. If the spec says that ax.foo.image is a "PNG 
image encoded with Base64", it is always encoded that way.

Actually, this means that the value is back to just being a UTF-8 
string. Binary data needs to be encoded to be transported as UTF-8 
characters.

>> and if the same encoding is used for all actual values,
> 
> I'm not sure what's the difference in your wording between 'value  
> format' and 'encoding'. From the AX point of view attributes have  
> values, which need to be encoded (formatted?) before being put on top  
> of an OpenID message.

Basically, my suggestion would be that any Base64 encoding is part of 
the value format and not of the encoding.

Claus