Re: Drastically cutting primary features [was Re: Last call for public comments on Web Crypto charter] from Mitch Zollinger on 2011-11-25 (public-identity@w3.org from November 2011)

From: Mitch Zollinger <mzollinger@netflix.com>
Date: Thu, 24 Nov 2011 23:19:30 -0800
To: <public-identity@w3.org>
Message-ID: <4ECF4182.2060103@netflix.com>
Apologies for coming late to this discussion. Mark Watson was kind 
enough to point my attention to this email thread & as I'm out on 
vacation (Happy Thanksgiving!) I haven't jumped in as soon as I should have.

After reading through the thread, I see two issues that I would like to 
address:

1. TLS. Can't TLS do everything that is needed for a secure protocol?
2. We shouldn't try to add in a full complement of crypto APIs because 
this is hard and produces a mess like the JCE, PKCS#11, or some other 
<insert your favorite here> complex, hard to understand set of APIs.

I'll take these in order:

1. TLS

We've spent the last 4 years finding that a secure protocol without TLS 
is a Really Good Thing for our use cases. I can break the issues down 
into two man categories: operational issues and performance issues.

Operational:
* When using TLS as a security model, you have to manage a trust store. 
For anyone that has done this for any amount of time, you know that CAs 
change their root certs, CAs issue subordinate CA certs, and CAs are 
compromised. (Ask Comodo.) Managing the trust store "securely" leads to 
the need to constrain the certs contained in the store and/or (usually 
"and" if you really want to be secure) add CRLs & OCSP into the mix. 
We've had TLS failures on our devices because of these operational 
issues which are out of our control. (Example: a CDN decides to change 
CA provider and doesn't tell us.)
* As referenced above, you need to manage CRLs & OCSP to do things 
right. We're working with a CDN partner right now that hosts their CRLs 
on a server in Europe which sometimes doesn't answer HTTP requests to 
get access to the CRL. What do you do? If you're "secure" you fail the 
SSL handshake (which error case the app may never see, as it's deep in 
the networking code.) and if you ignore the CRL retrieval failure & go 
ahead anyway, you've compromised yourself in the worst case and slowed 
things down in the best case (see below performance issues for more on 
this.)
* Time. All SSL certs have validity periods (NotBefore & NotAfter 
values). When a CA issues a cert, on 12/25/2011, the NotBefore value is 
12/25/2011. When an embedded device (think about that new LED flat 
screen TV "under" the Christmas tree) first comes up, it doesn't know 
what time it is. In fact, most of these devices don't even have battery 
backed clocks! So, if I plug in that new TV on Christmas day and the 
firmware has a "birth date" of 6/1/2011, the SSL handshake will fail 
without any sort of user visible error. (We're not in a "real" web 
browser that will pop up a dialog to complain about the cert.) What a 
disappointing user experience.
* When running on embedded devices, just the flash space & cache you 
need to maintain for CA certs, CRLs and OCSP responses sometimes pushes 
you into a place where device manufacturers balk.

Performance:
* Assuming you get through all the issues above (which we have) you'll 
find out that when you want a really high performance user experience, 
it's just not going to happen in many cases.
* CRL / OCSP retrieval & response issues. As mentioned above, we have a 
CRL distribution point managed by a major CA provider, used by a major 
CDN that simply fails to respond sometimes. Let's say for the sake of 
argument, the thing fails for 5% of all request during peak Netflix 
viewing hours. That means that if I watch movies & TV during peak hours, 
1 in 20 times I use my device I will actually hit my socket timeout 
value (1 minute on a lot of devices). I'm going to sit twiddling my 
thumbs wondering why things are so slow and this will happen 
non-deterministically.
* The above is a worst case (that we've seen) real world issue. But even 
in the "normal" case, climbing the X.509 certificate chain to validate 
an SSL server cert usually involves several calls out to CRL 
distribution points and OCSP responders. For some devices where the CAs 
are managed by 3rd parties, a 5-10 second SSL handshake is not unusual. 
In the case of Netflix, we want startup to happen in a second or less 
(imagine if we were BETTER than digital cable. That's a worthwhile goal, 
yes?) and using TLS means we can't get there.

2. Crypto APIs

We're also flummoxed by "standard" APIs like PKCS#11, the JCE, OpenSSL 
and others. Crypto can be hard if you try to create abstractions for 
every single type of cryptographic primitive and every single type of 
cryptographic operation. (Ever tried to create common APIs for RSA, ECC 
& DSA? Oh wait, DSA can't do encryption, or something, right? Ever tried 
to create a common MAC API that included something exotic like UMAC?) We 
don't have to introduce this level of complexity, and in fact we've 
created our own Netflix "cryptocommon" Java lib which strikes a very 
good balance between sufficient flexibility and intuitive usability. 
We've used that thing all over the place to wrap the sometimes bizarre 
JCE APIs.

We'd like to bring those learnings to bear on the current discussion, 
because allowing a sane collection of MACs (HMAC, ...), public key 
operations (RSA, ECC maybe DSA), symmetric key encryption (AES, 3DES, 
....), hashing (MD5, SHA1, SHA-256, ...), and even key exchange 
(Diffie-Hellman & AES key unwrapping) is really not that difficult, IF 
you've been forced to think hard about the problem before.

Apologies for the very long response. If you've made it this far, I 
really appreciate your taking the time to read through this.

Regards,
Mitch Zollinger

On 11/24/2011 5:40 AM, Stephen Farrell wrote:
>
> Saying why would be interesting. Many people have said they can't
> do TLS when its turned out that they could in fact do TLS so what
> is it that you need that you can't get via TLS with key insertion
> (for e.g. TLS-PSK renegotiation) and key extraction and some
> simple functions to use extracted keys?
>
> I realise a generic crypto API can be used for all sorts of fun,
> but the claim here seems to be that such an API is necessary.
> My claim is that such an API is basically JCE/JCA which is not
> a simple API.
>
> S.
>
> On 11/24/2011 01:32 PM, David Dahl wrote:
>> +1
>>
>> ----- Original Message -----
>>> From: "Mark Watson"<watsonm@netflix.com>
>>> To: "Harry Halpin"<hhalpin@w3.org>
>>> Cc: "Stephen Farrell"<stephen.farrell@cs.tcd.ie>, 
>>> "<public-identity@w3.org>"<public-identity@w3.org>
>>> Sent: Thursday, November 24, 2011 10:48:03 AM
>>> Subject: Re: Drastically cutting primary features [was Re: Last call 
>>> for public comments on Web Crypto charter]
>>> Harry,
>>>
>>> The possibility to develop secure application protocols in Javascript,
>>> without using TLS, is exactly the one of the points of this API, at
>>> least for us. The possibility to use pre-provisioned keys is also an
>>> essential component. So I wouldn't be in favor of this change and I'm
>>> not even sure it's a "simplification".
>>>
>>> ...Mark
>>
>>
>
>
Received on Friday, 25 November 2011 13:11:29 UTC