Re: Certificate Triplify Challenge from Henry Story on 2012-01-11 (public-xg-webid@w3.org from January 2012)

From: Henry Story <henry.story@bblfish.net>
Date: Wed, 11 Jan 2012 20:45:41 +0100
To: Kingsley Idehen <kidehen@openlinksw.com>
Cc: public-xg-webid@w3.org
Message-Id: <AFAEBD28-2253-4349-8C6E-EDB15197D0FB@bblfish.net>
On 11 Jan 2012, at 17:21, Kingsley Idehen wrote:

> On 1/11/12 11:02 AM, Henry Story wrote:
>> On 4 Jan 2012, at 19:25, Peter Williams wrote:
>> 
>>> Changing the encoding of the cert format on the wire makes no difference.
>> Makes no difference to what?
>> 
>> It could  make a difference to many issues we have discussed here, because it would help us to bring in some powerful logical tools from the semantic web space to the discussion.
>> 
>> For example it  can help us resolve what is behind  Kingsley's whole issue with ambiguity. The ambiguity he is seeing is very likely not in the URIs where he is looking for it in vain, but more likely in an interpretation of X509.
>> 
>> I put up my result from the triply challenge on the wiki here:
>> 
>>   http://www.w3.org/2005/Incubator/webid/wiki/X509Semantics
>> 
>> In particular Kingsley wants has an issue with the SAN because it is a name. This is in fact no problem for http URIs either with # or without (with 303 redirects). But it is a problem for e-mail identifier, or rather let us say it depends on how you "GRDDL" your X509.
>> 
>> Let us say you do this:
>> 
>> <http://example.com/cert>  a cert:Certificate ;
>>        log:semantics {
>>           <http://example.com/cert>  foaf:primaryTopic _:agent ;
>>           _:agent cert:distinguishedName [
>> 		a cert:DistinguishedName ;
>> 		x520:countryName "GB" ;
>> 		x520:localityName "London" ;
>> 		x520:organizationName "British Broadcasting Corporation" ;
>> 		x520:organizationalUnitName "Research and Development" ;
>> 		x520:commonName "Test Certificate" ;	
>>           ] ;
>>          owl:sameAs<http://example.com/me#person>  .
>>          cert:key [ a cert:RSAPublicKey ;
>> 		…
>>   	  ] ;
>>    }
>> 
>> Now imagine we could turn a DN into a  URL, so let's imagine we have some URL for DNS that works. Then we could write
>> equivalently
>> 
>> <http://example.com/cert>  a cert:Certificate ;
>>        log:semantics {
>>        <http://example.com/cert>  foaf:primaryTopic<DN:/CN=TestCertificate/OU=Research+and+Development/...>  .
>>      ...
>>     <DN:/CN=TestCertificate/OU=Research+and+Development/...>  owl:sameAs<http://example.com/me#person>  .
>> }
>> 
>> Ok, so that does indeed seem to capture the notion of an AlternativeName and the notion that the DN is the Subject Identifier.
>> 
>> The problem is with e-mails. Because there it does not work quite so nicely. You could not write simply
>> 
>> <http://example.com/cert>  a cert:Certificate ;
>>        log:semantics {
>>        <http://example.com/cert>  foaf:primaryTopic<DN:/CN=TestCertificate/OU=Research+and+Development/...>  .
>>      ...
>>     <DN:/CN=TestCertificate/OU=Research+and+Development/...>
>>               owl:sameAs<http://example.com/me#person>,
>>                          <mailto:me@example.com>  .
>> }
>> 
>> 
>> Why? Because mailto urls refer to mailboxes not to people. So this means that if you put the mailto URL in that position in that way then unless you create a special mapping from mailto urls to say { _p foaf:mbox<mailto:me@example.com>  . } and then use _p a blank node as the identifier, then you have a problem.
>> 
>> If you do that then you have another problem though because if it were<mailto:me@emable.com#me>  then what would you do?
>> So Kingsley's problem is that from the syntax of the X509 it is not clear - to our current research - what the solution here should be.
>> 
>> It could just be that here X509 is ambiguous. The people who developed this just were not thinking that carefully about what they meant by SAN. Or one could say the mapping of the relation between the DN and the SAN to owl:sameAs is perhaps too strict. Well there is room for interpretation here.
>> 
>> So it is not because something is ambiguous that you cannot later clarify it. By pushing people to use https urls with a hash we are I think squarely within the best interpretation of what a SAN is. When we move to e-mail addresses things get a big more awkward. But for the moment this is not such a big deal for us, because we are not trying to deal with e-mail addresses.
>> 
>> But it is a problem for Kingsley because he wants to make the California crowd that is adamant about e-mail addresses happy.
> 
> Wow!
> 
> I want to make the Internet crowd happy. I want to exercise the ingenuity inherent in URI abstraction.
> 
> What's wrong with the following in SAN?
> 
> URI=http://id.myopenlink.net/dataspace/person/KingsleyUyiIdehen
> RFC822 Name=kidehen@openlinksw.com

Why do you ask. Well let's look at the spec

http://tools.ietf.org/html/rfc3280#section-4.2.1.7

----------------------------------------------------------
4.2.1.7  Subject Alternative Name


   The subject alternative names extension allows additional identities
   to be bound to the subject of the certificate.  Defined options
   include an Internet electronic mail address, a DNS name, an IP
   address, and a uniform resource identifier (URI).  Other options
   exist, including completely local definitions.  Multiple name forms,
   and multiple instances of each name form, MAY be included.  Whenever
   such identities are to be bound into a certificate, the subject
   alternative name (or issuer alternative name) extension MUST be used;
   however, a DNS name MAY be represented in the subject field using the
   domainComponent attribute as described in  section 4.1.2.4
.

   Because the subject alternative name is considered to be definitively
   bound to the public key, all parts of the subject alternative name
   MUST be verified by the CA.

   Further, if the only subject identity included in the certificate is
   an alternative name form (e.g., an electronic mail address), then the
   subject distinguished name MUST be empty (an empty sequence), and the
   subjectAltName extension MUST be present.  If the subject field
   contains an empty sequence, the subjectAltName extension MUST be
   marked critical.

   When the subjectAltName extension contains an Internet mail address,
   the address MUST be included as an rfc822Name.  The format of an
   rfc822Name is an "addr-spec" as defined in RFC 822 [RFC 822].  An
   addr-spec has the form "local-part@domain".  Note that an addr-spec
   has no phrase (such as a common name) before it, has no comment (text
   surrounded in parentheses) after it, and is not surrounded by "<" and
   ">".  Note that while upper and lower case letters are allowed in an
 addr-spec, no significance is attached to the case.

   When the subjectAltName extension contains a iPAddress, the address
   MUST be stored in the octet string in "network byte order," as
   specified in RFC 791 [RFC 791].  The least significant bit (LSB) of
   each octet is the LSB of the corresponding byte in the network
   address.  For IP Version 4, as specified in RFC 791, the octet string
   MUST contain exactly four octets.  For IP Version 6, as specified in
   RFC 1883, the octet string MUST contain exactly sixteen octets [RFC1883].

   When the subjectAltName extension contains a domain name system
   label, the domain name MUST be stored in the dNSName (an IA5String).
   The name MUST be in the "preferred name syntax," as specified by RFC 1034 [RFC 1034].
   Note that while upper and lower case letters are
   allowed in domain names, no signifigance is attached to the case.  In
   addition, while the string " " is a legal domain name, subjectAltName
   extensions with a dNSName of " " MUST NOT be used.  Finally, the use
   of the DNS representation for Internet mail addresses (wpolk.nist.gov
   instead of wpolk@nist.gov) MUST NOT be used; such identities are to
   be encoded as rfc822Name.

   Note: work is currently underway to specify domain names in
   international character sets.  Such names will likely not be
   accommodated by IA5String.  Once this work is complete, this profile
   will be revisited and the appropriate functionality will be added.

   When the subjectAltName extension contains a URI, the name MUST be
   stored in the uniformResourceIdentifier (an IA5String).  The name
   MUST NOT be a relative URL, and it MUST follow the URL syntax and
   encoding rules specified in [RFC 1738].  The name MUST include both a
   scheme (e.g., "http" or "ftp") and a scheme-specific-part.  The
   scheme-specific-part MUST include a fully qualified domain name or IP
   address as the host.
   As specified in [RFC 1738], the scheme name is not case-sensitive
   (e.g., "http" is equivalent to "HTTP").  The host part is also not
   case-sensitive, but other components of the scheme-specific-part may
   be case-sensitive.  When comparing URIs, conforming implementations
   MUST compare the scheme and host without regard to case, but assume
   the remainder of the scheme-specific-part is case sensitive.

   When the subjectAltName extension contains a DN in the directoryName,
   the DN MUST be unique for each subject entity certified by the one CA
   as defined by the issuer name field.  A CA MAY issue more than one
   certificate with the same DN to the same subject entity.

   The subjectAltName MAY carry additional name types through the use of
   the otherName field.  The format and semantics of the name are
   indicated through the OBJECT IDENTIFIER in the type-id field.  The
   name itself is conveyed as value field in otherName.  For example,
   Kerberos [RFC 1510] format names can be encoded into the otherName,
   using using a Kerberos 5 principal name OID and a SEQUENCE of the
   Realm and the PrincipalName.

   Subject alternative names MAY be constrained in the same manner as
   subject distinguished names using the name constraints extension as
   described in section 4.2.1.11.

   If the subjectAltName extension is present, the sequence MUST contain
   at least one entry.  Unlike the subject field, conforming CAs MUST
   NOT issue certificates with subjectAltNames containing empty
   GeneralName fields.  For example, an rfc822Name is represented as an
   IA5String.  While an empty string is a valid IA5String, such an
   rfc822Name is not permitted by this profile.  The behavior of clients
   that encounter such a certificate when processing a certificication
   path is not defined by this profile.

   Finally, the semantics of subject alternative names that include
   wildcard characters (e.g., as a placeholder for a set of names) are
   not addressed by this specification.  Applications with specific
   requirements MAY use such names, but they must define the semantics.

   id-ce-subjectAltName OBJECT IDENTIFIER ::=  { id-ce 17 }

   SubjectAltName ::= GeneralNames

   GeneralNames ::= SEQUENCE SIZE (1..MAX) OF GeneralName
   GeneralName ::= CHOICE {
        otherName                       [0]     OtherName,
        rfc822Name                      [1]     IA5String,
        dNSName                         [2]     IA5String,
        x400Address                     [3]     ORAddress,
        directoryName                   [4]     Name,
        ediPartyName                    [5]     EDIPartyName,
        uniformResourceIdentifier       [6]     IA5String,
        iPAddress                       [7]     OCTET STRING,
        registeredID                    [8]     OBJECT IDENTIFIER }

   OtherName ::= SEQUENCE {
        type-id    OBJECT IDENTIFIER,
        value      [0] EXPLICIT ANY DEFINED BY type-id }

   EDIPartyName ::= SEQUENCE {
        nameAssigner            [0]     DirectoryString OPTIONAL,
        partyName               [1]     DirectoryString }

----------------------------------------------------------

So what is interesting from the above spec is that the X509 RFC allows 
http URLs and rfc822 email addresses, but types them differently. rfc822Nmae have type 1,
http urls have type 6. So this means

1. That both can be used in SAN (answering your question above)
2. That one can semantically distinguish their interpretation quite easily by mapping them to the following
   correct RDF

<http://example.com/cert>  a cert:Certificate ;
       log:semantics {
   <http://example.com/cert>  foaf:primaryTopic <DN:/CN=Kingsley+Idehen/...>  .

   <DN:/CN=Kingsley+Idehen/...>
              owl:sameAs <http://id.myopenlink.net/dataspace/person/KingsleyUyiIdehen>;
              foaf:mbox <mailto:kidehen@openlinksw.com>  .
}

(( again making up some DN URL scheme here. ))



>> They could use the accnt scheme<mailto:me@example.com>  and that would probably get a bit closer to us, depending on how you think of an account.
> 
> That's a mailto: scheme URI. They could also use acct: of course. Most important of all it could be any URI. The issue is the resolution mechanism that enables all the action occur from SAN.

yes, I meant the accnt: schema. I thought I had even typed it, but well. 

> 
> My example above separates the Name Identifier from the Resource Locator (Address) Identifier.
> 
> Hammer Stack covers the matter at hand. You can ignore it, but that's just being unrealistic about reality when the Internet is the domain of focus.
>> 
>> On the other hand one could just say, well X509 was never meant to be that coherent, so we can just be flexible here. And then things should fall in line again.
> 
> That's the point! Be flexible and the URI abstraction with do its thing. The scale is Internet scale not WWW scale. HTTP is about the WWW, which is an active part of the Internet. No matter how you cut it, now matter how useful it is etc.. it is still part of the Internet.
> 
> URIs solve an Internet scale problem, not a WWW scale problem. Linked Data doesn't have to be WWW scoped, it can work at Internet scale too.

That depends on what your definition of the Web is. Tim Berners Lee's definition was "a mapping from URIs onto meaning"
http://blogs.oracle.com/bblfish/entry/possible_worlds_and_the_web

> 
> The end destination is inevitable. WebID or NetID, note, I've seen this movie before.
> 
> I say "Check!" so your move next :-)

Well if the game is a good spec, then my answer is above. (but perhaps let's not get sidetracked in the meaning of the web, part).

> 
> 
> Kingsley
>> 
>> Henry
>> 
>>> You can spit it out in long RDF strings if you want. ASN.1 doesnt care whether you use DER, BER, PER, or XML. ISO defined the mappings onto XML, and compilers now spit out bytes - in binary XMl or long XML. One can define a spitter for RDF in any of tis encodings, if one wants. Any undergrad can do this (its just 1980s abstract/concrete type theory).
>>> 
>>> But, Henry is right, that this makes NO difference. Its still an ASN.1 cert, with particular set of type theory formalisms, that REALLY DO NOT WORK well with RDF/EAV (which is very pure). The semantics are abstract, and are not tied to the encoding.
>> yes, that is an interesting feature of ASN.1 . Of course the signature does tie the whole thing down to a particular format.
>> 
>>> 
>>> The esmantics for the cert AS A TYPE (not a blog) are also very much tied to the art of public key distribution on an internet sclae - whose PRACTICAL  security requires a particular way of relying on naming and addressing, and binding, and asserting, and validating, and (all the other things folks discuss here). The cert is just the lynch pin of that doctrine set (which is why folks discuss it endless, often in rant form). its why it gets the "evil' label (becuase it SO good at actualy doing what folks would LIKE to do, when replacing it). yes its getting old, as its tied to internet 1 (which is getting on in age).
>>> 
>>> In my R&D work, I put triple and sparql expression in the cert in the form of test of encoded into URIs), and avoid the whole SAN URI semantic wars. I thus describe identity the webby way from the outset, using the cert now NOT as above (but as a means to end, so https libs work). Its just a signed text stream, then, retaining some legacy key management controls so actual SSL is not compromised, too badly
>>> 
>>> 
>>> Back in 2007, the topics that looked interesting included:
>>> 
>>> http://yorkporc.wordpress.com/2007/09/30/copy-of-httpdarq-sourceforge-net-federated-queries-with-sparql/
>>> 
>>> http://yorkporc.wordpress.com/2007/09/23/email-post-on-using-deriving-sparql-queries-from-foaf-knows-relations-to-assure-pubkeys/
>>> 
>>> But, its takes SO LONG to do anthing, in semweb land, that I hardly remember even knowing what I knew then. Kingsley is reminding me, though.
>> yes, the myth that since the internet everything moves 7 times faster is a myth.
>> I myself thought that people would do things if one told them. But it turns out you have to do it yourself.
>>>> From: henry.story@bblfish.net
>>>> Date: Wed, 4 Jan 2012 19:04:48 +0100
>>>> CC: j.jakobitsch@semantic-web.at; public-xg-webid@w3.org
>>>> To: mo.mcroberts@bbc.co.uk
>>>> Subject: Re: Certificate Triplify Challenge
>>>> 
>>>> 
>>>> On 4 Jan 2012, at 16:05, Mo McRoberts wrote:
>>>> 
>>>>> On 4 Jan 2012, at 13:50, Henry Story wrote:
>>>>> 
>>>>>> As soon as you put things this way you realise that it is wrong in fact. Because the above fails to make the point that it is the Certificate that is making the agent claims. What is really needed there is to use N3 to express what is going on:
>>>>> Hmm, are you sure? Is it not that the certificate *carries* the claims made by the issuer?
>>>> A certificate is a document that is signed by an issuer. It is exactly the type of thing that has a semantics. In fact one could even say that a document is defined by its having a semantics. ( Btw. log:semantics is explained in more detail here http://www.w3.org/2000/10/swap/doc/Reach )
>>>> 
>>>> So let say I speak of a certificate<http://example.com/cert>  here in this e-mail, then I can say what type of thing it is, when it was made etc... I can make statements about that document.
>>>> But I can't speak about the contents of that certificate without asserting those contents themselves here. And anything or anyone reading what I am writing here would not know how to distinguish between what I am saying and what the certificate is saying, unless you use graphs or if you wanted to make your life really complicated reification. Ie one needs a quotation mechanism. In N3 you do this with {...}.
>>>> 
>>>> There is one particularly interesting exception I think and that is if the document<http://example.com/cert>  were to return also an RDF representation, and this would then be written something like
>>>> 
>>>> <>  a cert:Certificate ;
>>>> foaf:primaryTopic _:agent ;
>>>> cert:issuer<http://example.com/ca#cert>  ;
>>>> cert:serialNumber 1 ;
>>>> cert:notBefore "2012-01-01T14:00:00Z"^^xsd:dateTime ;
>>>> cert:notAfter "2012-12-31T13:59:59Z"^^xsd:dateTime ;
>>>> cert:extension [
>>>> a cert:basicConstraints ;
>>>> cert:extensionValue [
>>>> cert:ca "false"^^xsd:boolean ;
>>>> cert:pathLengthConstraint 0 ;
>>>> ] ;
>>>> ] ;
>>>> cert:signatureAlgorithm cert:sha1WithRSAEncryption ;
>>>> cert:signature "00010203040506070809...."^^xsd:hexBinary .
>>>> 
>>>> _:agent cert:distinguishedName [
>>>> a cert:DistinguishedName ;
>>>> x520:countryName "GB" ;
>>>> x520:localityName "London" ;
>>>> x520:organizationName "British Broadcasting Corporation" ;
>>>> x520:organizationalUnitName "Research and Development" ;
>>>> x520:commonName "Test Certificate" ;	
>>>> ] ;
>>>> cert:key [ a cert:RSAPublicKey ;
>>>> …
>>>> ] ;
>>>> owl:sameAs<http://example.com/me#person>  .
>>>> 
>>>> 
>>>> ( Well the only problem is that the signature would have to be outside the document, in another document presumably, because signing a document with an internal signature is a complicated trick. One would need a signature algorithm that removes certain triples - the signature tripes - before signing. And this may have many issues I don't know about )
>>>> 
>>>> Here the document is speaking about itself and its contained statements, so it is clear what the signature is about, and also what it is that the certificate in ASN.1 is saying. We have essentially a sketch of an RDF view on the X509 document here.
>>>> 
>>>>> If the purpose of the ontology is to allow round-tripping (which it must, IMO, so that you can verify the signature on the content — otherwise you might as well just have a lump of arbitrary signed RDF and forget about bothering with X.509's structure), then you have to be careful about how far you diverge from it, and that includes additional statements (which from a processor's perspective are just unsigned additional junk, like a comment header field in a PEM-formatted blob).
>>>>> 
>>>>>> then one realises that the MUST understand stamens are statements about grammar changes: they are saying that you cannot believe anything else about what you see in the document unless you understand one statement: i.e., that statement could possibly change the meaning of the other statements seen up to then.
>>>>> Yes… the criticality aspect of extensions falls into this category, although in X.509-land the rules assume that you do know how to process “an extension” in general and where to find the criticality field at a minimum — so with an RDF equivalent you could work on the same basis (i.e., you recognise cert:critical, and if it's set and you don't understand one of the classes associated with the extension, fail).
>>>>> 
>>>>> M.
>>>>> 
>>>>> -- 
>>>>> Mo McRoberts - Technical Lead - The Space,
>>>>> 0141 422 6036 (Internal: 01-26036) - PGP key CEBCF03E,
>>>>> Project Office: Room 7083, BBC Television Centre, London W12 7RJ
>>>>> 
>>>>> 
>>>>> 
>>>>> http://www.bbc.co.uk/
>>>>> This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
>>>>> If you have received it in error, please delete it from your system.
>>>>> Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
>>>>> Please note that the BBC monitors e-mails sent or received.
>>>>> Further communication will signify your consent to this.
>>>>> 
>>>> Social Web Architect
>>>> http://bblfish.net/
>>>> 
>>>> 
>> Social Web Architect
>> http://bblfish.net/
>> 
>> 
>> 
> ca
> 
> -- 
> 
> Regards,
> 
> Kingsley Idehen	
> Founder&  CEO
> OpenLink Software
> Company Web: http://www.openlinksw.com
> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca handle: @kidehen
> Google+ Profile: https://plus.google.com/112399767740508618350/about
> LinkedIn Profile: http://www.linkedin.com/in/kidehen
> 
> 
> 
> 
> 
> 

Social Web Architect
http://bblfish.net/
Received on Wednesday, 11 January 2012 22:37:37 UTC