Support for ECB - proposal for a decision from GALINDO Virginie on 2012-09-24 (public-webcrypto@w3.org from September 2012)

From: GALINDO Virginie <Virginie.GALINDO@gemalto.com>
Date: Mon, 24 Sep 2012 10:17:58 +0200
To: Ryan Sleevi <sleevi@google.com>, Zooko Wilcox-OHearn <zooko@leastauthority.com>, Vijay Bharadwaj <Vijay.Bharadwaj@microsoft.com>, Emily Stark <estark@MIT.EDU>, Richard Barnes <richard.barnes@cleverdis.com>, Anthony Nadalin <tonynad@microsoft.com>
CC: "public-webcrypto@w3.org" <public-webcrypto@w3.org>
Message-ID: <076ED1F6CB375B4BB5CAE7873691360703B51FD33F55@CROEXCFWP04.gemalto.com>
Dear all,

Trying to progress on that one with a  proposal for consensus:

We integrate the ECB mode in our API to answer needs from developers.
We add a note on the best practice usage.

What do you think ?

Regards,
Virginie


From: Ryan Sleevi [mailto:sleevi@google.com]
Sent: lundi 17 septembre 2012 19:09
To: Zooko Wilcox-OHearn
Cc: public-webcrypto@w3.org
Subject: Re: Support for ECB


On Sat, Sep 15, 2012 at 1:56 AM, Zooko Wilcox-OHearn <zooko@leastauthority.com<mailto:zooko@leastauthority.com>> wrote:
On Fri, Sep 14, 2012 at 4:47 PM, Ryan Sleevi <sleevi@google.com<mailto:sleevi@google.com>> wrote:
>
> I'm not sure if mandating one-call-per-block does anything to prevent
> its misuse,
I'm sure it does. I've seen it many times. The most recent time was
about five minutes ago, when I saw this new advisory:

https://lwn.net/Alerts/515542/

It says that another ECB mode vulnerability has been found in software
that users are currently relying on in the wild. This advisory was
particularly interesting to me because I'm the author of one of the
libraries mentioned in the advisory - the one named "pycryptopp". The
advisory says that the software is vulnerable when it uses the
"pycrypto" library, and requires a patch, but that if instead it used
its preferred "pycryptopp" library it was not vulnerable. Why is that?
Because when I wrote pycryptopp, I deliberately didn't offer ECB mode.

>  but it makes it rather hard from an implementation
> standpoint to make any good optimizations.
Well, I've actually been in a position where I could have benefited
from your optimization. I didn't think of it, and I wonder if anyone
else who has been in the same position has ever thought of it.

I'm surprised that this is new. It's one of the basic optimizations afforded by ECB and CTR mode, because it can be trivially parallelized - whether through chunking and distributing to distinct compute engines or precomputing expansions and sending those through the entire stream. In order to attain a decent cycles-per-block, it's often necessary to at least fill a single cache line, in not more.


I was implementing a prototype of the "Zfone" software with Phil
Zimmermann, which was the first implementation of ZRTP ¹. I noticed a
substantial performance problem in the CTR mode. That code was making
a callback from C to Python once per AES block so that the Python code
could generate the next counter value. I fixed this by writing C code
to generate the counter values without calling back to Python. Had I
thought of it, I could have used your hack on top of ECB mode instead,
generating an array of counter values in Python and then passing that
as a the "plaintext" to ECB mode.

¹ http://zfoneproject.com/zrtp_ietf.html

Your hack of implementing CTR on top of ECB mode would have had the
advantage of not requiring new C code - it could have been implemented
solely in Python. It would have been much faster than the "one Python
callback per block" version, but substantially slower than the
"everything in C" version that I ended up with.

I suspect with modern JIT techniques the performance gap between the
one-call-per-block implementation and the CTR-on-top-of-ECB technique
would be narrower. It would be interesting to measure that...

That's simply not true, unfortunately.

Consider a multi-process browser that uses privilege separation (WebKit2, Chrome) or that uses privilege reduction (IE 10), it may be necessary for crypto operations to pass through process boundaries in order to perform the actual operation.

Such separation is important when considering the 'external' key access (eg: keys not generated by the origin), but it's equally important when considering the "I'm using native crypto primitives to implement" scenario. In the case of these APIs (PKCS#11, CDSA, CryptoAPI/CNG), there are varying degrees of locks acquired - even in the pure software implementation - due to consistency requirements of the underlying API.

Finally, even under the "lock free" scenario, as shown by a number of existing optimized implementations, pre-computing (or allowing the implementation to pre-compute, by way of providing a definition of the counter, rather than the counts itself) generally offers the best performance per-block.


> I agree, the ideal goal is to be able to declaratively describe how
> things should be done (eg: declare the counter's incrementing function
> in terms of bits and offsets via supporting a CTR mode of operation),
> but absent that, I don't think we should necessarily penalize the
> non-declarative approach.
I agree that a way to declaratively specify an arbitrary "counter"
sequence would be cool. I don't see how to do it, either. Hopefully
increasing sophistication of JIT techniques will make the simple
imperative approach efficient enough.

An example of how to do it is already included in the draft, and reflects the approach used in PKCS#11.

I don't think it's a good API to intentionally know something is inefficient, and relying exclusively on JIT to reduce those gaps. We need a solution that will be viable to a number of performance enhancing techniques - because over time, all will be necessary.


I still don't agree that we ought to standardize ECB mode for this
purpose, though. It sounds like innovating a new technique, which is a
questionable thing to do in a standard, where presumably we should
just be codifying "best practices". Especially since this innovation
is how to use a "worst practice" in a novel, no-longer-broken way, so
standardizing it requires codifying a worst practice. ;-)

I still haven't seen a single example *in practice* of someone using
ECB mode safely, either for your hack of implementing CTR mode, or for
Vijay's suggestion of "vectorizing" many executions of the AES block
function. As Vijay pointed out, CTR mode is in the process of being
supplanted by authenticated encryption modes like GCM, and when people
*do* use CTR mode they usually use a simple incrementing CTR, which
the webcrypto standard already supplies.

Again, when you say "supplies", I think we have to be very careful here.

The spec is simply specifying how, if an implementation supports CTR, it should expose it.

It does not require an implementation support CTR. As I mentioned, there are a variety of reasons beyond technical that could cause an implementation to not support it.


> It seems like your main objection is just whether or not we call the
> "AES function" as "ECB mode", since functionally they yield identical
> results (for single block or multiple blocks). I'm just wanting to
> make sure we've progressed to discussing semantics, or if there is
> still a functional objection being raised here.
No, the AES function is not the same as ECB mode. This isn't a matter
of preference, it is simply incorrect to conflate the two.

And if they yield the same outputs for the same inputs, then for sake of discussion, I think we should keep conflating the two.


My main objection to standardizing ECB mode is that I have often seen
it used unsafely, and I haven't seen evidence that anyone has ever
used it safely in practice.

As has been pointed out, ECB can be used on a one-block-at-a-time as an equivalent method to a raw AES function.

As has been pointed out, ECB can be used with pre-computed counter blocks to serve as a CTR mode.

I'm not denying that using ECB-in-and-of-itself is almost always completely unsafe. But I have yet to hear a compelling alternative to supporting the polyfill use case. If we're not exposing ECB, but exposing a 'raw' encrypt, then I think for performance reasons we simply *must* also specify it as taking 1-or-more-blocks, at which point, its outputs are no different from ECB.

And as I'd reiterate, our work here is not mandating implementations MUST support it (although I think it might be a good *should*, for polyfill/interop reasons), simply specifying how it should behave if it *is* implemented.
Received on Monday, 24 September 2012 08:19:09 UTC