Re: Support for ECB from Ryan Sleevi on 2012-09-17 (public-webcrypto@w3.org from September 2012)

From: Ryan Sleevi <sleevi@google.com>
Date: Mon, 17 Sep 2012 10:08:34 -0700
To: Zooko Wilcox-OHearn <zooko@leastauthority.com>
Cc: "public-webcrypto@w3.org" <public-webcrypto@w3.org>
Message-ID: <CACvaWvbO_AG9xeRfy2VJn+mumDScs8okgTjYKDOKTaLop5ugRw@mail.gmail.com>
On Sat, Sep 15, 2012 at 1:56 AM, Zooko Wilcox-OHearn <
zooko@leastauthority.com> wrote:

> On Fri, Sep 14, 2012 at 4:47 PM, Ryan Sleevi <sleevi@google.com> wrote:
> >
> > I'm not sure if mandating one-call-per-block does anything to prevent
> > its misuse,
>
> I'm sure it does. I've seen it many times. The most recent time was
> about five minutes ago, when I saw this new advisory:
>
> https://lwn.net/Alerts/515542/
>
> It says that another ECB mode vulnerability has been found in software
> that users are currently relying on in the wild. This advisory was
> particularly interesting to me because I'm the author of one of the
> libraries mentioned in the advisory — the one named "pycryptopp". The
> advisory says that the software is vulnerable when it uses the
> "pycrypto" library, and requires a patch, but that if instead it used
> its preferred "pycryptopp" library it was not vulnerable. Why is that?
> Because when I wrote pycryptopp, I deliberately didn't offer ECB mode.
>
> >  but it makes it rather hard from an implementation
> > standpoint to make any good optimizations.
>
> Well, I've actually been in a position where I could have benefited
> from your optimization. I didn't think of it, and I wonder if anyone
> else who has been in the same position has ever thought of it.
>

I'm surprised that this is new. It's one of the basic optimizations
afforded by ECB and CTR mode, because it can be trivially parallelized -
whether through chunking and distributing to distinct compute engines or
precomputing expansions and sending those through the entire stream. In
order to attain a decent cycles-per-block, it's often necessary to at least
fill a single cache line, in not more.


>
> I was implementing a prototype of the "Zfone" software with Phil
> Zimmermann, which was the first implementation of ZRTP ¹. I noticed a
> substantial performance problem in the CTR mode. That code was making
> a callback from C to Python once per AES block so that the Python code
> could generate the next counter value. I fixed this by writing C code
> to generate the counter values without calling back to Python. Had I
> thought of it, I could have used your hack on top of ECB mode instead,
> generating an array of counter values in Python and then passing that
> as a the "plaintext" to ECB mode.
>
> ¹ http://zfoneproject.com/zrtp_ietf.html
>
> Your hack of implementing CTR on top of ECB mode would have had the
> advantage of not requiring new C code — it could have been implemented
> solely in Python. It would have been much faster than the "one Python
> callback per block" version, but substantially slower than the
> "everything in C" version that I ended up with.
>
> I suspect with modern JIT techniques the performance gap between the
> one-call-per-block implementation and the CTR-on-top-of-ECB technique
> would be narrower. It would be interesting to measure that…
>

That's simply not true, unfortunately.

Consider a multi-process browser that uses privilege separation (WebKit2,
Chrome) or that uses privilege reduction (IE 10), it may be necessary for
crypto operations to pass through process boundaries in order to perform
the actual operation.

Such separation is important when considering the 'external' key access
(eg: keys not generated by the origin), but it's equally important when
considering the "I'm using native crypto primitives to implement" scenario.
In the case of these APIs (PKCS#11, CDSA, CryptoAPI/CNG), there are varying
degrees of locks acquired - even in the pure software implementation - due
to consistency requirements of the underlying API.

Finally, even under the "lock free" scenario, as shown by a number of
existing optimized implementations, pre-computing (or allowing the
implementation to pre-compute, by way of providing a definition of the
counter, rather than the counts itself) generally offers the best
performance per-block.


>
> > I agree, the ideal goal is to be able to declaratively describe how
> > things should be done (eg: declare the counter's incrementing function
> > in terms of bits and offsets via supporting a CTR mode of operation),
> > but absent that, I don't think we should necessarily penalize the
> > non-declarative approach.
>
> I agree that a way to declaratively specify an arbitrary "counter"
> sequence would be cool. I don't see how to do it, either. Hopefully
> increasing sophistication of JIT techniques will make the simple
> imperative approach efficient enough.
>

An example of how to do it is already included in the draft, and reflects
the approach used in PKCS#11.

I don't think it's a good API to intentionally know something is
inefficient, and relying exclusively on JIT to reduce those gaps. We need a
solution that will be viable to a number of performance enhancing
techniques - because over time, all will be necessary.


>
> I still don't agree that we ought to standardize ECB mode for this
> purpose, though. It sounds like innovating a new technique, which is a
> questionable thing to do in a standard, where presumably we should
> just be codifying "best practices". Especially since this innovation
> is how to use a "worst practice" in a novel, no-longer-broken way, so
> standardizing it requires codifying a worst practice. ;-)
>
> I still haven't seen a single example *in practice* of someone using
> ECB mode safely, either for your hack of implementing CTR mode, or for
> Vijay's suggestion of "vectorizing" many executions of the AES block
> function. As Vijay pointed out, CTR mode is in the process of being
> supplanted by authenticated encryption modes like GCM, and when people
> *do* use CTR mode they usually use a simple incrementing CTR, which
> the webcrypto standard already supplies.
>

Again, when you say "supplies", I think we have to be very careful here.

The spec is simply specifying how, if an implementation supports CTR, it
should expose it.

It does not require an implementation support CTR. As I mentioned, there
are a variety of reasons beyond technical that could cause an
implementation to not support it.


>
> > It seems like your main objection is just whether or not we call the
> > "AES function" as "ECB mode", since functionally they yield identical
> > results (for single block or multiple blocks). I'm just wanting to
> > make sure we've progressed to discussing semantics, or if there is
> > still a functional objection being raised here.
>
> No, the AES function is not the same as ECB mode. This isn't a matter
> of preference, it is simply incorrect to conflate the two.
>

And if they yield the same outputs for the same inputs, then for sake of
discussion, I think we should keep conflating the two.


>
> My main objection to standardizing ECB mode is that I have often seen
> it used unsafely, and I haven't seen evidence that anyone has ever
> used it safely in practice.
>

As has been pointed out, ECB can be used on a one-block-at-a-time as an
equivalent method to a raw AES function.

As has been pointed out, ECB can be used with pre-computed counter blocks
to serve as a CTR mode.

I'm not denying that using ECB-in-and-of-itself is almost always completely
unsafe. But I have yet to hear a compelling alternative to supporting the
polyfill use case. If we're not exposing ECB, but exposing a 'raw' encrypt,
then I think for performance reasons we simply *must* also specify it as
taking 1-or-more-blocks, at which point, its outputs are no different from
ECB.

And as I'd reiterate, our work here is not mandating implementations MUST
support it (although I think it might be a good *should*, for
polyfill/interop reasons), simply specifying how it should behave if it
*is* implemented.
Received on Monday, 17 September 2012 17:09:07 UTC