[security] Widespread Timing Vulnerabilities in OpenID implementations

Nate Lawson nate at rootlabs.com
Thu Jul 15 20:54:09 UTC 2010


Phillip Hallam-Baker wrote:
>> Once unrolled it will be optimized back to the original
>> That makes no difference. Assume you're comparing two HMACs that
>> are 20 bytes long. Then you will have 20 XORs in a row. That has no
>> impact on whether the logic is constant-time or not.
>> 
>> It *is* hard to guarantee timing in high-level languages. But even
>> with C, you have cache and branch prediction behavior that require
>> deep awareness of the platform to eliminate.
>> 
>> However, the above function does not have any logical flaws that 
>> introduce timing leaks. It isn't remotely exploitable on any of the
>>  platforms we analyzed. Isn't it better to start from there rather
>> than introducing vulnerabilities?

> It really is hard to predict what the loop unroll logic of a future
> compiler is going to produce. There are optimizing compilers that do
> have the intelligence to eliminate evaluation of expressions of the
> form (true OR f(x)).
> 
> To make the code robust I think you have to have a #pragma nooptimize
> type capability. And I am not even sure if those exist or are
> respected in many of the current compilers.
> 
> One very simple way of improving the robustness of the code would be
> to do the compare in 64 or 32 bit chunks. That way the chance of
> leaving an exploitable timing hole is much smaller. It also speeds up
> the code considerably.

I agree. Providing high assurance guarantees that there are no timing
leaks is quite hard and architecture-dependent. However, from a language
standpoint, there is a decent expectation that a compiler will not
generate data-dependent branches for the source code we provided.

Your word-size compare is already implemented in some other environments
but on architectures that don't support unaligned reads, requires a
separate bytewise step for the initial and/or trailing 1-3 bytes. This
introduces more timing variability again.

The best option is to have an architecture-specific asm fragment for
each platform that runs with caches disabled, etc, etc. Barring that,
the next best thing is to write code that does not create branch points
at the language level. The latter is what we've suggested.

On a separate note, I don't think any implementers have chimed in.
Anybody have an idea when they plan to fix this?

-- 
Nate Lawson
Root Labs :: www.rootlabs.com
+1 (510) 595-9505
Solving embedded security, kernel and crypto challenges



More information about the security mailing list