Re-defining the Key-Value format (was: attribute exchange value encoding)

Mon May 28 12:55:09 UTC 2007

Johnny Bufu schrieb:
> So I've rewritten the encoding section, such that:
> 
> - for strings, only the newline (and percent) characters are required  
> to be escaped,
>    (to comply with OpenID's data formats), using percent-encoding;

This means that '%' characters need to be encoded up to three times:

For example:

User name: 100%pure

Embedded in an URI that is the value of the attribute:
   http://example.com/foo/100%25pure

Encoded for AX using Key-Value Form Encoding  (OID 2, 4.1.1.)
   openid.ax.foo.uri:http://example.com/foo/100%2525pure

Encoded for AX using HTTP Encoding (OID 2, 4.1.2.)
   openid.ax.foo.uri=http%3A//example.com/foo/100%2525pure

I don't think it's a good idea to introduce a solution to the "\n" 
problem in AX only. It should be part of the base spec (OpenId 2 
Authentication).

What about changing section 4.1.1. from:

         A message in Key-Value form is a sequence of lines.  Each
         line begins with a key, followed by a colon, and the value
         associated with the key.  The line is terminated by a
         single newline (UCS codepoint 10, "\n"). A key or value
         MUST NOT contain a newline and a key also MUST NOT contain
         a colon.

to (wording adapted from RFC 2822):

	A message in Key-Value form consists of fields composed of
         a key, followed by a colon (":"), followed by a value, and
         terminated by a single LF (UCS codepoint 10, "\n").

         The key MUST be composed of printable US-ASCII characters except
         ":" (i.e. characters that have values between 33 and 57, or
         between 59 and 126, inclusive). The key MUST NOT start with
         a '*' (codepoint 32).

         The value MUST be composed of a sequence of characters encoded
         as UTF-8. If an extension to this specification allows values
         that contain LF (UCS codepoint 10, "\n") characters, these LF
         (UCS codepoint 10, "\n") characters MUST be encoded as a
         sequence of LF, '*', ':' (UCS codepoints 10, 42, 32,  "\n*:").

    [Unlike the suggested %-encoding, this encoding is compatible with
    the current spec as long as LF characters are not actually allowed
    within the value.
    It's similar to the RFC 2822 folding mechanism but folding is only
    allowed (and mandated) where a LF is to be encoded. Further, the
    continuation line is compatible with the key-value format, using '*'
    as a pseudo key value.]

         If an extension to this specification needs to allows binary
         data in values, i.e. if it allows arbitrary bytes not to be
         interpreted as UTF-8 characters, it MAY use Base64 [<reference>]
         encoding for the specification of the format of that value.

    [Note: Base64, is quite efficient when it comes to encoding the
    message in HTTP Encoding (OID 2, 4.1.2.). Unencoded bytes would have
    to use the %-encoding, rougly doubling the size. Unencoded bytes also
    create problems if implementations think they should be UTF-8, e.g.
    if perl strings are used.]

> - base64 must be used for encoding binary data, and defined
>    an additional field for this:
>    	openid.ax.encoding.<alias>=base64

I think it's much simpler if the specification of the field value format 
just says UTF-8 or Base64 and if the same encoding is used for all 
actual values, even those that would not need any encoding.

Claus