Re-defining the Key-Value format (was: attribute exchange value encoding)

Johnny Bufu johnny at sxip.com
Mon May 28 18:28:28 UTC 2007


Hi Claus,

On 28-May-07, at 5:55 AM, Claus Färber wrote:

> Johnny Bufu schrieb:
>> So I've rewritten the encoding section, such that:
>>
>> - for strings, only the newline (and percent) characters are required
>> to be escaped,
>>    (to comply with OpenID's data formats), using percent-encoding;
>
> This means that '%' characters need to be encoded up to three times:

I'm not sure I follow your reasoning all the way; please see my  
comments below and point where I'm wrong.

> For example:
>
> User name: 100%pure
>
> Embedded in an URI that is the value of the attribute:
>    http://example.com/foo/100%25pure

This encoding happens outside of the OpenID / AX protocols. There's  
nothing we can do in the specs about it, if the value of an attribute  
is an URI like http://example.com/foo/100%25pure.

 From the OpenID / AX point of view, I view the above as an unencoded  
% character (AX doesn't know in this case that the payload is an  
URI); it's up to whoever consumes the attribute value to handle it  
properly.


> Encoded for AX using Key-Value Form Encoding  (OID 2, 4.1.1.)
>    openid.ax.foo.uri:http://example.com/foo/100%2525pure

AX has nothing to do directly with key-value encoding. I see no  
reference to percent-encoding from OpenID2's section 4.1.1.

But yes, using the AX 3.3.1 Default Encoding of a String Value [1],  
if user_name=100%pure the field in an key-value representation would be:

	openid.ax.foo.value=100%25pure


> Encoded for AX using HTTP Encoding (OID 2, 4.1.2.)
>    openid.ax.foo.uri=http%3A//example.com/foo/100%2525pure

Yes, there would be a double-encoding of the % char, one done by AX  
3.3.1, and another x-www-form encoding as required by OpenID 4.1.2  
for indirect messages.


> I don't think it's a good idea to introduce a solution to the "\n"
> problem in AX only. It should be part of the base spec (OpenId 2
> Authentication).

What do you see as pros / cons for each proposed solution?


> What about changing section 4.1.1. from:
>
>          A message in Key-Value form is a sequence of lines.  Each
>          line begins with a key, followed by a colon, and the value
>          associated with the key.  The line is terminated by a
>          single newline (UCS codepoint 10, "\n"). A key or value
>          MUST NOT contain a newline and a key also MUST NOT contain
>          a colon.
>
> to (wording adapted from RFC 2822):
>
> 	A message in Key-Value form consists of fields composed of
>          a key, followed by a colon (":"), followed by a value, and
>          terminated by a single LF (UCS codepoint 10, "\n").
>
>          The key MUST be composed of printable US-ASCII characters  
> except
>          ":" (i.e. characters that have values between 33 and 57, or
>          between 59 and 126, inclusive). The key MUST NOT start with
>          a '*' (codepoint 32).
>
>          The value MUST be composed of a sequence of characters  
> encoded
>          as UTF-8. If an extension to this specification allows values
>          that contain LF (UCS codepoint 10, "\n") characters, these LF
>          (UCS codepoint 10, "\n") characters MUST be encoded as a
>          sequence of LF, '*', ':' (UCS codepoints 10, 42, 32,   
> "\n*:").
>
>     [Unlike the suggested %-encoding, this encoding is compatible with
>     the current spec as long as LF characters are not actually allowed
>     within the value.

What makes the proposed percent-encoding incompatible with the  
current OpenID spec?


>     It's similar to the RFC 2822 folding mechanism but folding is only
>     allowed (and mandated) where a LF is to be encoded. Further, the
>     continuation line is compatible with the key-value format,  
> using '*'
>     as a pseudo key value.]
>
>          If an extension to this specification needs to allows binary
>          data in values, i.e. if it allows arbitrary bytes not to be
>          interpreted as UTF-8 characters, it MAY use Base64  
> [<reference>]
>          encoding for the specification of the format of that value.

I would be (mildly) ok with dealing with newline escaping in the core  
if others agree, but:
- it does add some extra stuff, which some may not like / approve
- it would add another item on the 'compatibility list', and another  
thing that OpenID 1/2 implementations would need to deal with twice
- not sure what would be the net advantage of having it there (aside  
from having consistency across all extensions).

>     [Note: Base64, is quite efficient when it comes to encoding the
>     message in HTTP Encoding (OID 2, 4.1.2.). Unencoded bytes would  
> have
>     to use the %-encoding, rougly doubling the size. Unencoded  
> bytes also
>     create problems if implementations think they should be UTF-8,  
> e.g.
>     if perl strings are used.]
>
>> - base64 must be used for encoding binary data, and defined
>>    an additional field for this:
>>    	openid.ax.encoding.<alias>=base64
>
> I think it's much simpler if the specification of the field value  
> format
> just says UTF-8 or Base64

The receiving party would need to distinguish between the two  
somehow, no? So a flag of some kind would need to be passed through.


> and if the same encoding is used for all actual values,

I'm not sure what's the difference in your wording between 'value  
format' and 'encoding'. From the AX point of view attributes have  
values, which need to be encoded (formatted?) before being put on top  
of an OpenID message.


> even those that would not need any encoding.

This is how it's specified right now for strings, so that we can get  
away with one flag with only one value (base64), and an implicit  
value (percent-encoding) if missing.


Thanks,
Johnny

[1] http://openid.net/svn/specifications/attribute_exchange/1.0/trunk/ 
openid-attribute-exchange.html#string-default-encoding




More information about the specs mailing list