[OpenID] URL normalization and capitalization

SitG Admin sysadmin at shadowsinthegarden.com
Tue Aug 5 04:32:36 UTC 2008


>If claimedID is essentially a byte array, then, obviously, any two 
>values (with identical encodings) are indeed different ...if any one 
>bit is different. It is intended to act as a "primary key", after 
>all.

So, just how many different possible URI's *can* we have, "in the wild"?
http://www.boutell.com/newfaq/misc/urllength.html
Let's assume that the servers involved are all running Microsoft's 
IIS, this lets us have sixteen thousand three hundred and eighty-four 
characters. Being generous, we'll allow that eight of these are in 
use for the prefix (https://) and two hundred fifty-six for the 
domain portion, plus one final character for the leading slash in the 
pathnme. These add up to 265 bytes that are effectively reserved, in 
that toggling their capitalization (assuming a character range of 
a-z) won't affect anything. If every OTHER byte *is* such a letter, 
the possibilities number around 2 to the power of 3,119.

It would take 390 bytes just to store this value in binary; IPv6, by 
comparison, only specifies a number of users contained within 2 to 
the power of 48, taking 6 bytes to store. So, in other words, I can 
discretely identify THREE THOUSAND FIVE HUNDRED AND TEN times as many 
individual users as could possibly exist within the entire address 
space of IPv6, just by varying the capitalization on one pathname. I 
think server limits will change before we realistically see that many 
users ;)

Considering what's possible with pathnames that vary by more than 
just capitalization, and how OpenID is supposed to be a 
"decentralized" identity system (i.e., it's very likely that URI's 
will vary by domain as well as path), do we really need to keep such 
an option available?

-Shade



More information about the general mailing list