Re: beyond MD5 to security logic -- Re: (Pre-)Intent to Deprecate: <keygen> element and application/x-x509-*-cert MIME handling from Henry Story on 2015-09-09 (public-webid@w3.org from September 2015)

From: Henry Story <henry.story@gmail.com>
Date: Wed, 9 Sep 2015 15:52:49 +0100
To: Carvalho Melvin <melvincarvalho@gmail.com>
Cc: Ryan Sleevi <sleevi@google.com>, blink-dev <blink-dev@chromium.org>, public-webid <public-webid@w3.org>, Chadwick David <D.W.Chadwick@kent.ac.uk>, Tim Berners-Lee <timbl@w3.org>
Message-Id: <D8250F32-B5C5-4ACB-AE8E-E0F98E9376D4@gmail.com>
> On 8 Sep 2015, at 14:02, Melvin Carvalho <melvincarvalho@gmail.com> wrote:
> 
> 
> 
> On 8 September 2015 at 14:38, Henry Story <henry.story@gmail.com <mailto:henry.story@gmail.com>> wrote:
> 
>> On 7 Sep 2015, at 22:01, Ryan Sleevi <sleevi@google.com <mailto:sleevi@google.com>> wrote:
>> 
>> On Sun, Sep 6, 2015 at 12:57 AM,  <henry.story@gmail.com <mailto:henry.story@gmail.com>> wrote:
>> 
>> MD5 is a message digest algorithm. Wether it is secure depends on what it is used for. For example if I create an MD5 of a file on my file system as a way to explain the command line tool md5, this presents no security threat
>> 
>> Henry,
>> 
>> Thanks for clarifying that you don't understand/appreciate the security consequences of MD5 collisions, particularly as they apply to signature schemes (such as that employed by CAs issuing certificate - *or* SPKAC certifying keys) or to message digests (allowing multiple messages to result in the same MD5)
>> 
>> I don't believe it's germane to blink-dev@ to explain the security significance of collision resistance, given that there are two decades' worth of research readily available to MD5, and ample security literature on its significance.
> 
> I am catching up quickly on 20 years of literature. Until now I was relying on you folks to improve this
> so I did not have to spend time learning it in detail. It seemed to me and to Tim Berners Lee and others that improving this functionality would come quite naturally. Even if one just stuck to the <keygen> html element one could have imagined quite a lot of ways forward. For example if MD5 is broken then one could give the server the option of asking the client for other signatures by generating html such as:
> 
> <keygen keytype="rsa" signature="sha1 sha2">
> 
> Perhaps as a default if the client still sent back MD5 signed signature the server could have created a much shorter certificate or refuse to make one.
> 
> If the spkac format was the real problem then the keygen type could have been improved with support for JOSE [1] the json based format  with
> 
> <keygen keytype="rsa" signature="sha1 sha2" signedpk="jose">
> 
> All of these have the advantages that they can work on non JS enabled browsers reducing some security holes due to JS for sites that have high security demands.
> 
> But lets look at this as an opportunity to have a discussion we should have had years ago, so that we can learn from each other. How much do you know about the semantic web, logic and modal logic? Those are also key parts of the security story.
> 
>> I'm afraid you've entirely missed the point for concluding that the concerns of "MD5 in SPKAC" are equivalent to "MD5 in issued certificates for keys attested to via SPKAC", since they are as far as two can be.
> 
> That was not my point. And I did not just make 1 point but a number of related ones, that work at different logical levels. Let me move from the top level down.
> 
>  Lets assume that MD5 in spkac is broken ( as all of us in the WebID group have ) and that therefore we have a protocol  that is currently much simpler that what was initially intended (which is what I described in my previous mail). This simplified protocol goes like this ( lets also abstract away from any format arguments )
> 
>   1.a. User Joe clicks on the keygen form presented by his web agent
>      b. the web agents keystore creates public/private key
>      c. the web agent sends the form data + public key to server (for all intents and purposed the public key is not signed )
>   2.a. the Server receives the form data, 
>      b. creates a certificate using the public key and other data 
>      c. sends it signed back to client
>   3.a. Client receives signed certificate ( lets assume its top quality certificate )
>      b. asks the user if he wishes to add it to the keystone
>      c. if user agrees, it is added  to keystore and associated with the existing private key
> 
> So the question from a number of us was:
>    what does an attacker gain by sending a public key in 1.b for which he does not have the private key?
> 
> Well he gets a signed certificate containing a relation of the form 
> 
>   joe cert:key pk .
> 
>  where joe is the identifier for Joe and pk is the name of the public key - some long set of numbers that I don't want to write out here to keep the text concise.
> 
>   And here we could just stop and say the problem is that the CA ( be it my freedom box or a large CA ) has signed something false, and that saying something unverified is wrong. I think there is ground for thinking that when someone says or writes something they are responsible for what they write. ( Note of course that context matters a lot here, as we know from going to the cinema that an actor murdering someone is not of course liable for murder in the real world, though the character he plays  should be in the story of the film - we will get to context ritght in the next paragraph).
> 
> Still an example of a misues would be better, as it will help us work out what the logical inference looks like that can lead someone into error. To do that we need to distinguish what someone says and what is said. Do do that we need to use a quotation mechanism as that is what a certificate is: it is a signed statement saying that some agent ( often thought of as a CA ) is certifying something. To be formal about this we need to introduce a quotation mechanism. We can do this with the N3 graph quotation symbols '{ and '}' .  Quoted block { } allows one to say what someone said without taking what is said to be true. For example you could use it in belief contexts like this
> 
>      jane believes { jon at Work }
> 
> It is not because Jane believes something that I or you or Jon should believe it. On the other hand it is not because she believes something that it is false either. That is the whole purpose of quotation contexts.
> 
>  So using this we can see that a cert logically looks something like this. 
> 
> JoesCertWithNoPrivKey 
>    signature [ cert:signedBy <https://myfreedombox.example/ <https://myfreedombox.example/>>;                     
>                cert:signedKey <https://myfreedombox.example/pk# <https://myfreedombox.example/pk#>>;
>                cert:signature "2asdefs32asda...." ];
>    cert:certifies { 
>        joe cert:key pk .
>    } .
> 
> where JoesCertWithNoPrivKey is the name of the certificate for which Joe does not have a private key. We have a signature composed of a number of fields, including who signed it and with what key,
> and the signature, and we have in the cert:certifies  quoted block with the content of the statement that was signed.
> 
> We can imagine that another CA has published a certificate for Tim Berners Lee with the same public key 
> 
> timblsCert 
>    signature [ cert:signedBy <https://w3.org/# <https://w3.org/#>>;                      
>                cert:signedKey <https://w3.org/pk# <https://w3.org/pk#>>;
>                cert:signature "de238ab73...." ];
>     cert:certifies {
>        timbl cert:key pk .
>          }
> 
> Now someone trusting both <https://myfreedombox.example/ <https://myfreedombox.example/>> and <https://w3.org/# <https://w3.org/#>> 
> could come to the conclusion that 
> 
>    joe cert:key pk .
>   timbl cert:key pk .
> 
> If we have the cert:key relation be inverse functional ( owl:InverseFunctionalProperty [2] )
> then from the above two statements it would follow that 
> 
>    joe = timbl .
> 
> So someone or some software agent, would mistakenly come to conclude that joe was timbl. This could be especially problematic if say Joe was an annoying Troll. ( But you can imagine all kinds of negative consequences here ).
> 
> So how do we deal with this? Essentially this boils down to a number of ways of making sure that
> the CA does not need to say something false. There are three avenues here:
> 
>    A. Fixing Signature and challenge
> 
> Adding a signature and challenge to the protocol that works ( ie fixing the MD5 problem ) perhaps with help of FIDO hardware tricks.  This would make it easier for CAs to have some ground for thinking that they are publishing something true.
> 
>   B. Unlinkability
> 
> 
>  Not allowing the key to be used across origins as FIDO requires for unlinkability ( see my argument to the TAG https://lists.w3.org/Archives/Public/www-tag/2015Sep/0023.html <https://lists.w3.org/Archives/Public/www-tag/2015Sep/0023.html> )
> 
>   Of course with FIDO the servers still actually need to save the data for each user. So there is a datastructure that they need to store which we could describe like this:
> 
>   _:agent cert:key pk .
> 
>   and then later they could tie an an openId, WebID, email address and other identifiers to that agent as they ask the user for more information ( all this is allowed by FIDO), at which point their data structure would look like this:
> 
>   timbl cert:key pk;
>      foaf:mbox <mailto:timbl@w3.org <mailto:timbl@w3.org>>
>      foaf:homepage <http://www.w3.org/People/Berners-Lee/ <http://www.w3.org/People/Berners-Lee/>> .
> 
> The  FIDO philosophy as it is now, is to make the clash describer earlier impossible because any two keys generated are always generate for one web site only. But still: what if someone accidently creates the same key, or a database is corrupted and two people on two sites end up with the same key, and these two sites are communicating? I suppose other hacks are envisageable.
> 
>    Given that FIDO and friends only want the key to be used on one origin each site could be a bit more careful with the reasoning by creating a structure that does not directly tie a key to an account but a key and the origin to the account. Perhaps like this:
> 
>    timbl authn [ fido:key pk;
>                  origin <https://w3.org/ <https://w3.org/>> ].
> 
> and the other key can be written out as:
> 
>    joe authn [ fido:key pk;
>                origin <https://myfreedombox.example/ <https://myfreedombox.example/>> ].
>   
> The idea is here that even if the two servers exchanged this information behind the scenes they should not be able to conclude that joe = timbl . They could be surprised that the two accounts share the same key but they would not jump to the conclusion that the people are the same.
> 
>   C. change the meaning of cert:certifies
> 
> Up till now we have been understanding the cert:certifies relation in the claim 
> 
> timblsCert 
>    signature [ cert:signedBy <https://w3.org/# <https://w3.org/#>>;                      
>                cert:signedKey <https://w3.org/pk# <https://w3.org/pk#>>;
>                cert:signature "de238ab73...." ];
>     cert:certifies {
>        timbl cert:key pk .
>          }
> 
> as meaning that this is a simple statement by the w3c that timbl's key is pk . But this is not actually how certificates work. They are more complicted than this. For example in X509 the contained statement is not unconditional with X509. 
>  • The statement is meant to only be valid if the date range is valid
>  • Or if the user has verified that the certificate is still valid by using CRL or OCSP
>  • Or if there are no need to know structures in the certificate
>  ...
> 
> So here the extra rule could just be: the content of the certificate should only be believed if a connection is made with someone who can prove that they have the private key of the public key given in the certificate.
> 
> This would remove the ability of someone collecting certificates to jump to the conclusion that because two certificates signed by trusted CAs (perhps the same) have the same public key, they are the same person. Rather the logic has to go that one can only believe the content of the CA if one has a case of someone prooving they have the private key of the given public key.
> 
> This seems pretty reasonable restriction to make. There is little reason anyone should be collecting certificates and from that coming to conclusions about the identity of people across certificates. This is not a primary use case for certificates.
> 
> For the WebID protocol this still leaves the issue that there is another place where the public key is published and that is in the WebId Profile Document [3].  An agent that was crawling the linked data network would still come across two documents and add the following graphs to its graph store
> 
>    joeProfile log:semantics { joe cert:key pk . }
>   timblProfile log:semantics { timbl cert:key pk . }
> 
>  But here again it is not because someone writes in their profile something that it should be believed, so the union of the two profile documents is not automatic. 
> 
>   One may go a step further, and take an idea from our solution B. Unlinkability and not publish in the webid profile document the cert:key relation directly to the public key, but instead to the certificate.
> The  Profile document would then look like this
> 
>    joeProfile log:semantics { joe cert:cert JoesCertWithNoPrivKey . }
>   timblProfile log:semantics { timbl cert:cert timblsCert . }
> 
> Now merging the two graphs would not lead to an identification of timbl with joe. 
> But still joe could also lie here about his certificate by claiming he had timblsCert .
> So I am not sure this really helps that much. But more work is needed here.
> 
> Conclusion
> ---------------
> 
> The main problem that creating certificates for users which don't have a private key for a public key is that this certificate could be used to make identifications of people who are not identical. But this requires a logical rule of inference that is not a primary use case for certificates, and it need not be thought of that way. So the easiest way is just to state that such reasoning is invalid. There are many reasons it may be: people may lie about certificates, people may weirdly enough try to create certificates for which they don't have the private key, etc, etc... If this logic inference is not allowed then the fact that MD5 is broken in current SPKACs cannot really lead to any major problems. Rather it is important that this failure be widely known, as it makes it easier to argue against the logical inference we want to argue against.
> 
> 
> Hope this helps, and I welcome good well argued feedback based on reasoning and if possible logic
> 
> +1 
> 
> Thanks for going through this, the private key never leaves the browser, so cannot be compromised.  Talk of an attack vector, which has never been described, seems to be mainly FUD, imho.

Things are not that simple. Prof David Chadwick (CCed in the previous post) pointed out to me the following

> the only attack I can think of in your given example is that, acting as
> MITM, I substitute the certificate on a signed message with my false
> certificate containing a different subject and then let the message
> continue on its way. In this way the recipient MAY (only may, not will,
> because it depends upon the signed content not containing the real
> details of the sender) believe that the message came from someone other
> than the real sender.
> 
> Here is where it might be useful. You submit a patent application
> electronically and I, acting as a MITM, substitute your certificate with
> one containing me as the subject, and the patent gets registered in my
> name and not yours.

So we can imagine that Tim Berners Lee on inventing the web decided to patent it,
and that he somehow mistakenly used a man in the middle web site https://patants.org/
where he uploaded the specs for the web and signed it with his certificate ( though at no
point mentioning his name in the patent ). So my man in the middle web site could then
using the fake certificate connect to the real patent web site and send the patent along
with my certificate that used his public key, and ho presto I'd be potentially super rich. :-)

Of course Tim could then in court of law still proove that the real certificate was his, as he,
and not I had the private key. But this law suite would certainly mark a bad start for the web. ( apart of course from the idea of patenting it )

This is not a problem for WebID-TLS as it is used now, as this only uses the authentication mechanism of TLS, and browsers do not provide a signing mechanism. Still other software with singing abilities could access the certificate in the key chain and propose to use it to sign some document.

If follows that certificates made without solid secure challenge abilities (eg. the current MD5 situation) should not be enabled for signing. ( X509 allows one to specify what a certificate can be used for ).
Clearly this type of impliction should be documented clearly. We should specify note this in the WebID-TLS spec.

It also gives a good reason for having stronger secure challenge features in <keygen> or whatver replaces it. 

Btw. could a FIDO based system - assuming it were extended to allow public keys to be used across origins - improve the secure challenge feature?

I really feel we are making progress here.

Henry


>  
> 
>  Henry Story
> 
> 
> [1] https://datatracker.ietf.org/wg/jose/documents/ <https://datatracker.ietf.org/wg/jose/documents/>
> [2] http://www.w3.org/TR/owl-ref/#InverseFunctionalProperty-def <http://www.w3.org/TR/owl-ref/#InverseFunctionalProperty-def>
> [3] http://www.w3.org/2005/Incubator/webid/spec/tls/#the-webid-profile-document <http://www.w3.org/2005/Incubator/webid/spec/tls/#the-webid-profile-document>
> Social Web Architect
> http://bblfish.net/ <http://bblfish.net/>
Social Web Architect
http://bblfish.net/
Received on Wednesday, 9 September 2015 14:53:27 UTC