Re: implementors first impression notes on the spec from Henry Story on 2011-11-18 (public-xg-webid@w3.org from November 2011)

From: Henry Story <henry.story@bblfish.net>
Date: Fri, 18 Nov 2011 19:07:15 +0100
To: Peter Williams <home_pw@msn.com>, WebID XG <public-xg-webid@w3.org>
Message-Id: <85B7604F-641B-45A1-892F-2E7A5E7D1EE9@bblfish.net>
Ok, I missed your previous e-mail.


On 17 Nov 2011, at 07:34, Peter Williams wrote:

>  "This URI should be dereference-able" - refering to the URI in the name field of a self-signed cert. To this implementor this phrase means IETF SHOULD, in the absence of any other definition. Perfectly good reasons exist therefore not to make that assumption, that is, and "complete" code should not assume that all certs have dereference-able URIs. If some thing else is meant, it should be stated.


yes, that is also how I read it.

I think we should say something about https URLs there, and make that the core example of the spec. 


> "Using a process made popular by OpenID, we show how one can tie a User Agent to a URI by proving that one has write access to the URI" I recommend this be changed to OpenID v1.0. The set of ideas to which the text is referring were found to be almost 100% absent in my own, recent production adoption - at national scale - of 2 infamous openID providers (using v2 and later vintages of the OpenID Auth protocol). Neither IDP  showcased the URI heritage of the openID movement. Both showcased email identifiers, to be perfectly honest. At W3C level, we should be accurate, and tie the reference of OpenID 1.0. Concerning openid and webid, there the common history over URIs ends.

Ah, ok. Perhaps we can now remove the reference to OpenID then. That would just make things simpler. 

>  
> "TODO: cover the case where there are more than one URI entry" is a topic is still pending, 12 months in. I'd expect this defined by now (since its a 3 year old topic). I dont know what to do as an implementor. Im going to take the text literally. That is: it is not an exception to encounter multiple URIs. I will take one at random from the set, since there is no further requirement. It is not stated what to do in case no URIs are present. I will raise a 500 HTTP code, in such case; since no specification is made. (NB text later seems to fill in some details, here, and a ref needs to be made to it).

Yes. We need to go through the todos. Just take one if you want at random, or as many as you have time to work on.

>  
> "A URI specified via the Subject Alternative Name extension" - Cert extensions dont specify. Standards groups specify. Choose a better verb. Im sure W3E Editor language is standardized on this kind of topic.
>  
> The terminology section is sloppy. It uses identification credentials, defines identification certificates, and refers to webid credentials. These are all critical terms, and the language model seems adhoc, at best.

Agree. I propose we Use  Client/Server/ and X509 Certificate, which is the language used in the TLS spec.  On the whole we restrict ourselves to TLS - we don't have time for more generality, and the other groups don't seem to be that interested in participating directly either. We can have other documents later.

So we can call the document "WebID over TLS Authentication" perhaps.


>  "A widely distributed cryptographic key that can be used to verify digital signatures and encrypt data". This is sloppy security speak. A Public Key is very rarely used to encrypt data. A professional would say it encrypts key (key being very specifically typed/distinguished from untyped data). A competent review by W3C of commodity crypto in the web will show that there is almost zero use of public key encryption of data. Keeping the text as is will setup off alarm bells in crypto policy enforcement circles. 

yes, no idea why we have "widely distributed" in there. it's not relevant. Is this better?

public key
A cryptographic key that can be used to verify digital signatures signed with the corresponding private key. A public key is always included in an Identification Certificate.
Perhaps we need a definition for private key too then.

Anyway this is what the TLS spec says http://www.ietf.org/rfc/rfc2246.txt
public key cryptography
       A class of cryptographic techniques employing two-key ciphers.
       Messages encrypted with the public key can only be decrypted with
       the associated private key. Conversely, messages signed with the
       private key can be verified with the public key.
perhaps we should just copy that and have both definition for public and private key point to that

> "A structured document that contains identification credentials for the Identification Agent expressed" - the term "for" came across where I'd expect "of".  A

good, and replace Identification Agent with Client

> s an implementor, Im taking the rest of the definition of WebID Profile to mean that its fine for a Validation Agent to just consume only 1 and 1 only, say the XML serialization of RDF. This means that interworking with my implementation will properly fail should the user not have XML form of the RDF identification credentials. And, to this implementor, this is as intended by the design, as stated. The implemention will be conforming, in this failure case.

This is what it says:

WebID Profile
A structured document that contains identification credentials for the Identification Agent expressed using the Resource Description Framework [RDF-CONCEPTS]. Either the XHTML+RDFa 1.1 [XHTML-RDFA] serialization format or the RDF/XML [RDF-SYNTAX-GRAMMAR] serialization format must be supported by the mechanism, e.g. a Web Service, providing the WebID Profile document. Alternate RDF serialization formats, such as N3 [N3] or Turtle [TURTLE],may be supported by the mechanism providing the WebID Profile document.
I don't like the word "supported" it should be "published" .

But clearly the spec says here that your implementation must understand RDFa 1.1 or RDF/XML . Now I am not sure if RDFa 1.1 is not a bit too new for most software to support. 

>  I feel it is inappropriate to be reading such subtle implementor-centric information in a definition, note. Move this to normative text paragraphs. One should not be reading between the lines on such critical matters.

Yes, I agree the whole of section 2.3 is about the WebID Profile. We should move the representations text to that section

>  
>  "Public Key" is missing capitalisation.



>  
> "The user agent will create a Identification Certificate with a Subject Alternative Name URI entry." Most User Agents do not "create" certificates. They create certificate signing requests, in a form that is not X.509 standardized, for most cases.

Indeed. This should be fixed. 

I have a diagram for how this works with keygen. Perhaps we should after all put that in there. In any case the USer Agent usually does not send the WebID in the certificate request, but usually just the name. The server is usually in the best position to know what the URI for the User is . On the other hand it can be done differently, as I do in the test suite on the read-write-web with SPARQL updates.

So I think one should mention 

 - creating using keygen
 - that there are other ways...

That the result is the client having a certificate with a WebID or more in there.


> Mixing "Create" with certificate in the context of user agent is sending confusing signals to implementors. User Agent should be capitalized. The number of SANs should be more clearly defined, at creation time. The text here implies there is only one.

yes, that is very confusing paragraph.

>  
> "This URI must be one..." contradicts earlier text, that says that a URI SHOULD (be de-referencable). Decide between MUST and SHOULD, and capitalize terms properly per IETF guidelines.

I think SHOULD is in order here. Because we don't want to say that the network has to be in perfect condition all the time. And also the server can authenticate without deferencing the urls.

>  Remove the sexist "he" from a W3C document. There is more appropriate, standard W3C phraseology.

If someone has an idea...

>  Remove the commentary sections, sometimes in yellow: this or that is under debate. As an implementor, it confuses me. I dont know what to do, and wonder about doing nothing till the debate settles down - since it was evidently SO vital to point it all out. Not good style for an Editors draft.

well we are looking for feedback.

>  
> The example certificate is strange. It has two critical extensions, that Verification Agents MUST enforce. One says its not a CA. the other says the user of the associated private key has a certiicate signing privilege. One who signs certificates is a CA, by definition. A not particularly literal reading of the cert handling standards would require a validation agent to reject that cert, since (a) it MUST enforce the extensions, since they are CRITICAL, and (b) there is a contradiction in what they say, technically. This is just confusing and unecessary. No  reference is made to IETF PKIX, and its not clear what cert-related conformance standard MUST be applied, by webid protocol implementations, if any.

Thanks. That is probably also a bug in my code, which goes back to Bruno Harbulo's first implementation
Let's remove  Certificate Sign . 
We'll need to generate a new certificate, or else the certificates and their signatures won't match.

I was also told recently that the OID should not be a URL but a number as that could break some service otherwise. 
I can't remember where that conversation too place.

//perhaps we should fix those with a note to regenerate the certificate

We need the subject alternative name, clearly.

>  
> As an implementor of a Validation Agent, do I or do I not enforce "O=FOAF+SSL, OU=The Community of Self Signers, CN=Not a Certification Authority" in the issuer field? All I have is an open question, and no stricture. Decide one way or the other. If its mandatory, specify the DN properly. As an implementor I dont know if the order or the DN components is left to right , or otherwise. I should not have to read or use openssl code to find out or use to see if my software is conforming.

That was not determined yet. The idea was to use that as a way to have servers select certificates based on this.

This is now ISSUE  62 http://www.w3.org/2005/Incubator/webid/track/issues/62

I think we should try it out and see how well it works. There is also a conceptual issue with that. I suggest we discuss that on the thread for that issue, because it is perhaps the only real unknown we have left.

>  
> "The above certificate is no longer valid, as I took an valid certificate and change the time and WebID. As a result the Signatiure is now false. A completely valid certificate should be generated to avoid nit-pickers picking nits" Can we fix this, and remove stuff that looks like the spec is half baked? its giving a false impression of the quality level of the group (and W3C, I might add). It looks like there is no quality control, peer review, etc. After 12 months of editing, I expect better.

Ok. So we need to generate a new certificate anyway. I suppose we should have text such as "generated on such a date: " to help us having to change the certificate every few years...

>  
> Ok Im going to relax a bit, as its increasingly evident that the spec writers are not all native English speaker, and some of them are struggling to write formal, technical English. It needs some comprehensive editing by one used to very formal, technical writing.

yes. It was done by a bunch of people in a very short time frame a year ago. So we need to take this to the next level.


>  
> "The document can publish many more relations than are of interest to the WebID protocol,"... No it cannot, for the purposes of THIS spec, and this WebID protocol. Protocols are hard code, they dont have "interests" in perhaps doing something. Either I code it or I dont. Define what is out of scope, and perhaps imply that non-normative protocol "extensions" might use such information. However, these are not in scope of this document.

Those are called extensions in X509. So it is possible. RDF is defined to be monotonic in the way relations are added. In that it is an improvement over x509.  Other relationships can be added to the document, as long as they don't contradict what is being said of course.

>  
> "The WebID provider must publish the graph of relations in one of the well known formats,". No. First, WebID "provider"  is undefined. The entity publising a WebID Profile ought to be defined, and that entities term be used consistently. Second, the publisher MUST produce XML and/or RDFa, and possibly others. Failure to produce one of RDF/XML or RDFa is non-conforming, according to earlier mandates.

Ok we can live with that restriction for the moment.


>  
> 2.3.1 goes on about Turtle representation. Remove it completely, since its not even one of the mandatory cases defined as such in the spec.

It is one that is the easiest to read for humans. I think it is even a standard now. So I'd rather leave it and add it as an optional representation.

>  
> 2.3.2 is incomplete, missing the doctype. I dont know whether a mislabelled doctype with apparent RDFa is conforming or not. This omission is not helping implementors.

This is one for the RDFa specialists here.
Also I would suggest the name spaces be perhaps moved further down into a <span> so that when blogging engines allow people to add rdfa to their blogs, people don't feel like they need to own the whole page for it to work.

>  
> I dont know the if example HTML document elements in 2.3.2 verifies. My own efforts to use something similar from the mailing list, made MIcrosoft's visual studio's verification system for HTML complain, bitterly. But then, I was confused, not knowing which doc type and which validation procedure to follow. This seems inappropriate position for W3C to take.
>  
> "If a WebID provider would rather prefer not to mark up his data in RDFa, but just provide a human readable format for users and have the RDF graph appear" phrasing makes it sound like RDFa is not intended to be machine-readable. This is probably unintended mis-direction.

I don't know how many people implemented this. I have not implemented this ever myself. Has anyone? It is not a bad idea btw, but I am not sure it is 100% correct.

In any case the whole section should  point to this
http://www.w3.org/TR/swbp-vocab-pub/


>  Step 5 in the picture of 3.1 makes it appear that WebID Profile document SHOULD be published on https endpoints. State clearly what the conformance rules are.

I am for sticking to https just because otherwise everything is too complicated to write out. We can leave it to the next group to generalise.

>  
> being over-pedantic (like a code writer), the request in step 5 is issued by the protected resource, but the response to 5 is handled by the webserver hosting the page, in the pseudo-UMLish diagram notation. This is a signal to me. I dont know how the webserver is supposed to tie up the request to the response, since it was not the initiator. Label the diagram as NON_NORMATIVE.

yes, the arrows should both go through the boxes. Or we don't have the local cash. This monday we were thinking of not having the local cash, but I still think it is important in explanation of how this can be efficient.

>  
> Step 3 is missing information - that a web request is issued, over https.

the images and the steps don't lign up. That is something I'll fix this weekend.

> Distinguish the SSL handshake (signaling for and delivering the cert) from the SSL channel over which an HTTP request MUST be sent, if I interpret 8 correctly.

You mean it should not be a thick double arrow. I agree. I am not sure how one can make that clear that all those connections are
part of the same TLS connection.... (i.e., 1,2,3, and 8)

> Again, its not clear if only information retrieval requests, post handshake MUST use https, or not, the same ciphersuite MUST be used as negotiated by the the cert-bearing handshake, or not, whether one can rehandshake or not... While a summary diagram cannot express all those things, I am confused about what is mandatory owing to the omission of a competent step 3.

Ah yes. Ok got it. Yes we don't have a web request in this section.  That is because as we were discussing this monday, it is not clear that we should restrict ourselves to HTTP Requests here. This applies to any TLS connection. Ie the first part (1,2,3,8) is independent of there being HTTP below. 

But I also agree that it would be very useful to show how this interacts with HTTP.


>  
> Steps 6 and 7 are poorly done. They imply, from the activity diagram that the protected resource will perform the identity and authorization queries (and not the webserver). Thats what it says (to an implementor).

  Agree there should be an Authentication/Identification Agent - perhaps an Identification Guard that is outside the resource and inside the 
server.
>  
> Remove the phrase "the global identity". This is far too grandiose.

ok.

>  
> "The Identification Agent attempts to access a resource using HTTP over TLS [HTTP-TLS] via the Verification Agent." needs rewriting.

yes, I agree that all of this section needs careful rewriting. We were discussing this on Monday in the teleconf. 


> It comes across as saying (with the via cosntruct) that the Validation agent is a web proxy to the Identification Agent (which is the UA, typically).

We need a new word for the agent that simply does the SSL public key verification. Let's call it the SSL guard. 
I think we should draw the SSL Guard in the diagram and say that this SSL guard does only one thing: verify that the pubic key matches the private key in the TLS request ( we need to be very specific and point to the exact part of TLS mentioned here).

We need to then have another guard, the identity verifier, that does the WebID protocol (he's kind of drawn in 4, but we should separate him).


> Note typically, in windows, the kernel intercepts the TLS invocation, performs the SSL handshake, and does NOT know that a given resource is being requested, of a webserver. There are strong implementation biases in this formulation that are improper, for a W3C spec. Generalize.



>  
> Im almost 18 years out of date on SSL spec, but I dont recall something called the " TLS client-certificate retrieval protocol." The actual protocols of the SSL state machine are well defined in the SSL3 and followup IETF specs. They have proper names. Be precise. Editor to help those for whom English is second language. Editor needs to do fact checking, and quality control.

Yes. 

>  
> step 2 of 3.1 is confusing. If makes it sound like if an unsolicited certificate message is sent voluntarily by client, before a certificate is sought by the server, it MUST be rejected by the server. Is this what is meant? Must any unsolicited presentation be rejected? If so what MUST happen? is there an SSL close?

I did not know that there were unsolicited clienet authentication messages. What happens usually at the TLS layer? Is that even allowed?

>  
> If a TLS session is resumed, and a request received inducing the TLS entity to commence a complete SSL handshake and requesting a certificate, is it a WebID Protocol violation to use the client certificate from the earlier completed handshake? (this case is common.) MUST certificates be provided by the current handshake? If two handshakes on 1 TCP connection present diferent client certificates, what happens?

I think if we separate the role of the TLS Guard, from the role of the WebID verifier, then those issues are no longer important. The WebID verifier need only know that the session he has is verified cryptographically. He then can use the public key as the handle for the  the user.  Of course that is often done by using the certificate as the handle.

For those who are interested, in the read-write-web project I keep a Map of certificates and their claims, so that given the same certificate
I can find the verifications done on it for a persiod of 30 minutes. That means that once the WebID verification is done, I don't need to do it again - unless requested.

    35 import com.google.common.cache.{CacheLoader, CacheBuilder, Cache}
    46 
    47   val idCache: Cache[X509Certificate, X509Claim] =
    48      CacheBuilder.newBuilder()
    49      .expireAfterWrite(30, TimeUnit.MINUTES)
    50      .build(new CacheLoader[X509Certificate, X509Claim] {
    51        def load(x509: X509Certificate) = new X509Claim(x509)
    52      })
    53 

>  
> The phrase "claimed WebID URIs." should be removed, unless some doctrine about "claimed'ness is exaplained". 

It says "URI entries which are considered claimed WebID URIs."

But perhaps we should enter "claimed WebID" in the lexicon at the bottom.

A certificate that has been verified by the TLS Guard (verifier?) contains WebID claims, because they have not been asserted yet by a CA.
So the process of WebID authentication is to verify those claims. That is what the WebID verifier does.


> 4 says "must attempt to verify thepublic key information associated with at least one of the claimedWebID URIs. " it doesnt say HOW to verify the public key "information" (which I assume to mean "Public Key"). Is this an implemention specific method?
>  
> It doest make much sense, in technical English, to say the public key ...associated with at least one of... How do I tell that this one is, and that one is not? I think the text is trying to say: pick one by one of the URIs from the cert, and see if they verify against the graph (not that we have pulled it yet, in the step sequence). one must verify. Much tighter grade specification language is required, focussed on implementors writing state machines.

ok.

>  
> step 5 is clear, but makes wholly improper use of MUST.

I feel there are often too many MUSTs in this document too.

>  
> remove common fallacy "verifies that theIdentification Agent owns the private key". There is no method know to do this (establish ownership). One can establish "control over." by the method of proving posession of.

indeed. 

>  
> Change" TLS mutual-authentication between the Verification Agent and theIdentification Agent." to TLS mutual-authentication between the Verification Agent and the User Agent" Identificiation Agents cannot perform TLS, let alone performs a mutually authentication service, let alone authenticate a server or a "Validation Agent". Browsers (ie. User Agents) can. Recall how the two were distinguished in the definition.

Yes, we were discussing on Monday the changes to this vocabulary. Thanks for bringing this up serperately, so I don't sound like I am crazy.

>  
> "If the Verification Agent does not have access to the TLS layer, a digital signature challenge must be provided by theVerification Agent. "

> This is  MUST, and is obviously critical. While there is a forward reference, I dont have a clue what this referring to. This is VERY WORRYING. I also dont know what "have access to the TLS layer" means. Does receiving client certs and information from the SSL server side socket constitute "access to". Very vague. Needs work. Never found the refernece, VERY CONFUSING INDEED. I probably would not hire a person saying this to me... note.; and would quickly wrap up the interview.

Completely confusing I agree. This came from the BrowserId folks essentially wanting to have a spec that did not work over TLS. Of course the problem is they cannot do that correctly until they have cryptography in the browser, which is very much what it seemed to me at the time. So we can remove this now, and generalise the spec in a couple of years to cover those cases.

>  
> The next section is titled "Authentication Sequence Details", but was referenced as "Additional algorithms" earlier. Dont misuse the term algorithm, or qualify with "additional". From the title, its just more detailed steps in a sequence description.
>  
> 3.2.1 and 3.2.2 are blank, and are implementation critical. Their absence makes the spec almost unimplementable. its only by luck that the current 15 implementations do the same thing, here, if that is indeed the case.

we're lucky folk here. But I agree, it's because we try to test against each other that we have had working implementations, and also why we are building test suites. So this is test driven spec writing. We have code, and some test suites, and the spec is evolving with it.

>  
> 3.2.3 A Verification Agent must be able to process documents " has an inappropriate use of MUST. one never uses MUST in a MUST "be able". Learn to use SHOULD, MUST properly. This is elementary spec writing. On the topic, the statement contradicts earlier mandates concerning either/or requirements for validation agents handling RDFa, and/or RDF/XML. Previously, a Validation Agent was NOT required to always "be able to" process RDFa (for example my implementation will not). The text makes it sound that a Validation Agent that fails in its HTTP request to advertise the willingness to accept text/HTML would be non-conforming, for example.

I think the spec writers seems to have want rdfa and rdf/xml on the whole. Is there a reason you cannot provide RDFa? 


>  On formalist grounds "Verifying the WebID is identified by that public key" does a public key even identify a WebID. a WebID is not a defined term. Only an WebID URI is defined. Since there are multiple public keys that associated with a WebID URI, what kind of multi-identification relation is being mandated here, to be then verified? I dont know.

I think we should reserve the term WebID for the URI, it's a bit silly to say WebID Universal Resource Identifier. Then the WebID TLS protocol is what we are describing here.

>  
> VERY VERY WORRYING is the following : "against the one provided by the WebID Profile or another trusted source, ". This text suggests that a Validation Agent implementation may opt out of using a WebID Profile document,as dereferenced using a Claimed URI, and use some other source instead.

I agree, that is something we should not get into. That was some people trying to be too general. Of course if you trust that joe fetches URIs for you, then you can think of him as a proxy. Perhaps that is what is meant. Perhaps something like 

 against the one provided by the WebID Profile  directly or via a trusted proxy



> To be literal, the MUST steps are to be performed, but then an implementation may ignore the result and just use something else. Kind of makes me want to research additional options... if true.

Yes, we all think there are additional ways, but the spec won't be finished if we try to list them all here. So I am for cutting options now.

>  
> "9D ☮ 79 ☮ BF ☮ E2 ☮ F4 ☮ 98 ☮..."  has strange symbols (in my rendering). I dont know if to take it literally, or not, speaking as an implementor. Publish in PDF, if the web cannot render literals.

They are peace symbols. Here's a picture in png format

 

That's enabled by our cert:hex format, because we just dump any non hex characters. 

>  
> "3.2.6 Secure Communication
> 
> This section will explain how an Identification Agent and a Verification Agent may communicate securely using a set of verified identification credentials.
> If the Verification Agent has verified that theWebID Profile is owned by the Identification Agent, the Verification Agent should use the verifiedpublic key contained in theIdentification Certificatefor all TLS-based communication with the Identification Agent."
>  
> it "will" or it "does".  First, one doesnt' "communicate security" using credentials, verified or otherwise. I have absolutely no idea what Im supposed to do for the folowing SHOULD. What is the scope of this SHOULD? The current web session? If the cert expires after 30m of activity on the wire, what am I supposed to do? keep using it? close the SSL session? we must use MUST SHOULD properly, and not make vacuous conformance statements.

thanks.

>  
>  
> 3.3.1 goes on about topics not within the scope of the webid protocol (e.g. accounts, and foaf attributes). Remove, or label as non-normative. 

yes. 

>  
> "The following properties should be used when conveying cryptographic information inWebID Profile documents:"... Surely MUST.

ok. in the case of the RSA key that's not a must it's a should. W
Let's make the cert:key and the modulus exp a must.

> Surely, if someone choose the path of least resistance in a SHOULD clause, there will be NO interworking?      
>                                        


yes. Later when we have more intelligence and reasoning there will be other options, but for now we make it must.


Ok, thanks,

  I'll go over this during the weekend and try to work on rewriting as much as I can.

On Monday we have a teleconference where we can go over these in detail.

 Henry


Social Web Architect
http://bblfish.net/
Attachments

text/html attachment: stored
image/png attachment: Screen_Shot_2011-11-18_at_19.01.36.png
Received on Friday, 18 November 2011 18:08:04 UTC