RE: implementors first impression notes on the spec from Peter Williams on 2011-11-17 (public-xg-webid@w3.org from November 2011)

From: Peter Williams <home_pw@msn.com>
Date: Thu, 17 Nov 2011 02:29:37 -0800
To: "public-xg-webid@w3.org" <public-xg-webid@w3.org>
Message-ID: <SNT143-W47E4D1940DAE266678EE1092C70@phx.gbl>

I looked back at the code from my first attempts at foaf+ssl implementation, from nearly 3 years ago. Very little has changed, when one review the spec. Most of the ideas beyond the obvious have all fallen by the wayside. My code was a little interceptor written in ASP.NET C# scripts, and Im struggling to find a reason not to do the same, this time around. Almost all the difficulties last time concerned https/SSL, and the attempt to control the windows implementation thereof so it sites behaved like example sites, such as foaf.me programmed by folks close to the hug o semantic web thinking. A few things are NOT in the spec, and will thus NOT be in my implementation. Only the spec controls, as the incubator comes to the end of its life.  This list go much longer than I though it would, which has surprised me. Feel free to hit delete, now. They are an implementors note on what one NEED not do, when simply doing what the spec says. ---------- The spec does not say that the server must send a particular list of DNs in the SSL handshake, where the spec might have specified the null list. Therefore, any values are conforming. The topic of DNs in the SSL handshake is upto the implementor.  The spec has nothing to say about certificate selectors in phones, PCs, or tablets, or the topic of automatic re-selection of certificates once indicated. The spec says nothing about issuer alternative names, or certificate chains, or population of DNS names in the SAN, or wild card names. There is nothing in the spec about web applications, login state, logout state, or session cookies. There are no UI design proscriptions, whatsoever. The term REST does not appear, and RESTfull web applications are not specifically included or excluded. Nothing in the spec says that once a TLS session comes into being, after a webid-enhanced SSL handshake, that RESTful HTTP verbs are mandatory or that the SSL session must be used to retain security/session state in an otherwise stateless app. One is entitled to use a java cookie, should one wish.  There is nothing in the spec about IDPs, or an architecture that assumes that websites will receive assertions from an IDP that acts as a webid Validation Agent.  There is specifically no mention of any special relationship with an OpenID Provider and its assertion mechanisms. The signature format specified in the foaf+ssl project is not a part of the webid spec, and can be considere dead and buried (which means I throw away the bit of script, I distributed). There is nothing in the spec that discusses key management, or certificate enrollment, or other provisioning of client certs in a browser. Though some folks have built websites that issue certs and indeed mixed certificate provisioning with foaf card making, nothing in the protocol requires such an implementation. According to the incubator, there is nothing inappropriate or contrary to the interoperability goals in loading opera with a .p12 file to provision the client cert. In IE, there is nothing wrong with the cert being provisioned using Windows Mobile credentials in a windows domain managed world. There is nothing about using enrollment plugins, or HTML5 tags. There is nothing in the spec that says one must use hashtags, as the default hash tag is quite appropriate. Proper handling of hashtags is implied, though.  There is nothing in the spec about a validating agent using a foaf card as a source of transitive trust statments or using foaf groups, when takeing authorization decision to open the port to the resource and admit the client UA according to the admission policy. A Validation Agent can be talking conventionally to a radius server or voice call agent doing admission control, for all the spec cares. There is nothing in the spec about national id cards, or smartcards; or FIPS crypto systems; or specialized code loading procedures for webid protocol modules There is nothing in the spec about DNS, signed resource records, or DANE.  As far as I can tell,  I need not consider DNS whatsoever. Nothing in webid requires that a DNS query be issued, and its quite appropriate to use a host file or an LDAP server cum DNS server (i.e. activeDirectory). Nothing requires that webid endpoints be registered in the public DNS, and IP addresses are quite ok as URI authorities. nothing requires a Validation Agent to use an authoritative DNS responder for the zone, for example. There is NOTHING in the spec about using https in any particular way - e.g. warnings for mixed content sites. For example, though foaf.me had a double handshake (a step up procedure), nothing in the spec required it or even discusses it. There is NOTHING in the spec about logout, or the use of the SSL close messages tied to a web session, or a logout event handler. There are no constraints prefering one version of TLS to another, and nothing in webid validation presumes TLS 1.2 for example. There is nothing in the spec about certificate expiry, or long lived TLS (week long) sessions, or TLS sessions that resume with a KDF-only mini-handshake after the 3-message TCP handshake. There was nothing about pipelining or interleaving messages, or otherwise multiplexing fragments of HTTP reqs/resp on the SSL record layer bearer., in some manner special to web validation flows. There is similarly NOTHING in the spec about common browser mechanisms, such as File->New->session, or control-N - which typically present interesting https edge cases for web developers. For example, access to a resource on a webserver via webid protocol in two browser instances, seperate by File->New->session MAY get 2 separate SSL sessionids, use two different certificates/webid URIs, and both talk to the same webapp. A logout (SSL close) on one, will not influence the other. There is nothing in the spec that requires the protocol run over any particular version of TLS, prohihits or requires any particular ClientHello extension. Nothing is stated about server names indicators, for viritual hosting of SSL endpoints in profesional data centers using laod balancer arrays, for example for the WebID Profile "provider". Therefore a correct implementation focussed on web farms/gardens ignores the topic, and two vendor get to do different things, quite properly. Ntohing in the spec calls for green address bars, or UI plugins that show the current cert tied to a given visible tab. Should a validation agent receive a redirect upon GETing a WebID Profile document, the spec says nothing about what to do, or not do when following the chain of referalls, including those redirects that pass between http and https endpoints. There are no security-specialized semantics concerning different types of HTTP return code. For example, receiving a 301 does not mean the Validation Agent MUST or SHOULD use the permanent redirect value the next time the WebID URI is encountered; and thus no mechanism need exist to remember such bindings. ----------- I could go on. In short, there is precious little profiling of https in any way, specific to the semantic web. What  there is stated is a small behaviour that may be implemented in a page handler interceptor that gets a document using a webGET method , given a client certificate presented in a CGI variable.  The incubator has defined at least one additional rule ...that did not exist in FOAF-SSL days (nearly 3 years ago). It does state what a Validation Agent MUST do when multiple URIs are present in the client cert.         
 From: home_pw@msn.com
To: public-xg-webid@w3.org
Date: Wed, 16 Nov 2011 22:34:09 -0800
Subject: implementors first impression notes on the spec











henry asked me to review the spec, by cmpalining that I have not done so for some considerable while. Find below an unfiltered review of the official "W3C Editor's Draft of Webld 1.0", downloaded from http://www.w3.org/2005/Incubator/webid/spec/. I note that it has the formal status now, within W3C process, of "W3C Editor's Draft".  With such labelling, Im expecting higher quality in content, finishing and use of language.
 
Some of the comments are quite critical. Do not take the critical tone too literally: as I  have left them unfiltered, so one see an experts mind, at work. I am very expert in my security fields - as redcently indpendently tested, certified and re-examined (but whose topic list exludes cryptography, where I just have fun as a total amateur looking at ancient cryptosystems). Remember that all feedback was welcomed. Dont get too defensive, and dont email a response after drinking. I want webid to work and to take off (since every other use of client certs has failed, nad its great that W3C is having a go). 
 
On the evidence presented, the method and the spec itself are both quite some ways off readiness for wider adoption or wider review, even. This is cleary an incubator-grade deliverable (not that thre is anyting appropriate about that status).
 
This is long and detailed, so print it out. If you dont agree, don't bother arguing with me. Just ignore the line. its just peer-peer feedback, issued in a friendly forum. You are getting an implementors view.
 
Peter.
 
------------
 
"This URI should be dereference-able" - refering to the URI in the name field of a self-signed cert. To this implementor this phrase means IETF SHOULD, in the absence of any other definition. Perfectly good reasons exist therefore not to make that assumption, that is, and "complete" code should not assume that all certs have dereference-able URIs. If some thing else is meant, it should be stated.
 
"Using a process made popular by OpenID, we show how one can tie a User Agent to a URI by proving that one has write access to the URI" I recommend this be changed to OpenID v1.0. The set of ideas tpo which the text is referring were found to be almost 100% absent in my own, recent production adoption - at national scale - of 2 infamous openID providers (using v2 and later vintages of the OpenID Auth protocol). Neither IDP  showcased the URI heritage of the openID movement. Both showcased email identifiers, to be perfectly honest. At W3C level, we should be accurate, and tie the reference of OpenID 1.0. Concerning openid and webid, there the common history over URIs ends.
 
"TODO: cover the case where there are more than one URI entry" is a topic is still pending, 12 months in. I'd expect this defined by now (since its a 3 year old topic). I dont know what to do as an implementor. Im going to take the text literally. That is: it is not an exception to encounter multiple URIs. I will take one at random from the set, since there is no further requirement. It is not stated what to do in case no URIs are present. I will raise a 500 HTTP code, in such case; since no specification is made. (NB text later seems to fill in some details, here, and a ref needs to be made to it).
 
"A URI specified via the Subject Alternative Name extension" - Cert extensions dont specify. Standards groups specify. Choose a better verb. Im sure W3E Editor language is standardized on this kind of topic.
 
The terminology section is sloppy. It uses identification credentials, defines identification certificates, and refers to webid credentials. These are all critical terms, and the language model seems adhoc, at best.
 
"A widely distributed cryptographic key that can be used to verify digital signatures and encrypt data". This is sloppy security speak. A Public Key is very rarely used to encrypt data. A professional would say it encrypts key (key being very specifically typed/distinguished from untyped data). A competent review by W3C of commodity crypto in the web will show that there is almost zero use of public key encryption of data. Keeping the text as is will setup off alarm bells in crypto policy enforcement circles. 
 
"A structured document that contains identification credentials for the Identification Agent expressed" - the term "for" came across where I'd expect "of".  As an implementor, Im taking the rest of the definition of WebID Profile to mean that its fine for a Validation Agent to just consume only 1 and 1 only, say the XML serialization of RDF. This means that interworking with my implementation will properly fail should the user not have XML form of the RDF identification credentials. And, to this implementor, this is as intended by the design, as stated. The implemention will be conforming, in this failure case.
 
I feel it is inappropriate to be reading such subtle implementor-centric information in a definition, note. Move this to normative text paragraphs. One should not be reading between the lines on such critical matters.
 
 "Public Key" is missing capitalization.
 
"The user agent will create a Identification Certificate with aSubject Alternative Name URI entry." Most User Agents do not "create" certificates. They create certificate signing requests, in a form that is not X.509 standardized, for most cases. Mixing "Create" with certificate in the context of user agent is sending confusing signals to implementors. User Agent should be capitalized. The number of SANs should be more clearly defined, at creation time. The text here implies there is only one.
 
"This URI must be one..." contradicts earlier text, that says that a URI SHOULD (be de-referencable). Decide between MUST and SHOULD, and capitalize terms properly per IETF guidelines.
 
Remove the sexist "he" from a W3C document. There is more appropriate, standard W3C phraseology.
 
Remove the commentary sections, sometimes in yellow: this or that is under debate. As an implementor, it confuses me. I dont know what to do, and wonder about doing nothing till the debate settles down - since it was evidently SO vital to point it all out. Not good style for an Editors draft.
 
The example certificate is strange. It has two critical extensions, that Verification Agents MUST enforce. One says its not a CA. the other says the user of the associated private key has a certiicate signing privilege. One who signs certificates is a CA, by definition. A not particularly literal reading of the cert handling standards would require a validation agent to reject that cert, since (a) it MUST enforce the extensions, since they are CRITICAL, and (b) there is a contradiction in what they say, technically. This is just confusing and unecessary. No  reference is made to IETF PKIX, and its not clear what cert-related conformance standard MUST be applied, by webid protocol implementations, if any.
 
As an implementor of a Validation Agent, do I or do I not enforce "O=FOAF+SSL, OU=The Community of Self Signers, CN=Not a Certification Authority" in the issuer field? All I have is an open question, and no stricture. Decide one way or the other. If its mandatory, specify the DN properly. As an implementor I dont know if the order or the DN components is left to right , or otherwise. I should not have to read or use openssl code to find out or use to see if my software is conforming.
 
"The above certificate is no longer valid, as I took an valid certificate and change the time and WebID. As a result the Signatiure is now false. A completely valid certificate should be generated to avoid nit-pickers picking nits" Can we fix this, and remove stuff that looks like the spec is half baked? its giving a false impression of the quality level of the group (and W3C, I might add). It looks like there is no quality control, peer review, etc. After 12 months of editing, I expect better.
 
Ok Im going to relax a bit, as its increasingly evident that the spec writers are not all native English speaker, and some of them are struggling to write formal, technical English. It needs some comprehensive editing by one used to very formal, technical writing.
 
"The document can publish many more relations than are of interest to the WebID protocol,"... No it cannot, for the purposes of THIS spec, and this WebID protocol. Protocols are hard code, they dont have "interests" in perhaps doing something. Either I code it or I dont. Define what is out of scope, and perhaps imply that non-normative protocol "extensions" might use such information. However, these are not in scope of this document.
 
"The WebID provider must publish the graph of relations in one of the well known formats,". No. First, WebID "provider"  is undefined. The entity publising a WebID Profile ought to be defined, and that entities term be used consistently. Second, the publisher MUST produce XML and/or RDFa, and possibly others. Failure to produce one of RDF/XML or RDFa is non-conforming, according to earlier mandates.
 
2.3.1 goes on about Turtle representation. Remove it completely, since its not even one of the mandatory cases defined as such in the spec.
 
2.3.2 is incomplete, missing the doctype. I dont know whether a mislabelled doctype with apparent RDFa is conforming or not. This omission is not helping implementors.
 
I dont know the if example HTML document elements in 2.3.2 verifies. My own efforts to use something similar from the mailing list, made MIcrosoft's visual studio's verification system for HTML complain, bitterly. But then, I was confused, not knowing which doc type and which validation procedure to follow. This seems inappropriate position for W3C to take.
 
"If a WebID provider would rather prefer not to mark up his data in RDFa, but just provide a human readable format for users and have the RDF graph appear" phrasing makes it sound like RDFa is not intended to be machine-readable. This is probably unintended mis-direction.
 
Step 5 in the picture of 3.1 makes it appear that WebID Profile document SHOULD be published on https endpoints. State clearly what the conformance rules are.
 
being over-pedantic (like a code writer), the request in step 5 is issued by the protected resource, but the response to 5 is handled by the webserver hosting the page, in the pseudo-UMLish diagram notation. This is a signal to me. I dont know how the webserver is supposed to tie up the request to the response, since it was not the initiator. Label the diagram as NON_NORMATIVE.
 
Step 3 is missing information - that an web request is issued, over https. Distinguish the SSL handshake (signaling for and delivering the cert) from the SSL channel over which an HTTP request MUST be sent, if I interpret 8 correctly. Again, its not clear if only information retrieval requests, post handshake MUST use https, or not, the same ciphersuite MUST be used as negotiated by the the cert-bearing handshake, or not, whether one can rehandshake or not... While a summary diagram cannot express all those things, I am confused about what is mandatory owing to the omission of a competent step 3.
 
Steps 6 and 7 are poorly done. They imply, from the activity diagram that the protected resource will perform the identity and authorization queries (and not the webserver). Thats what it says (to an implementor).
 
Remove the phrase "the global identity". This is far too grandoise.
 
"The Identification Agent attempts to access a resource using HTTP over TLS [HTTP-TLS] via the Verification Agent." needs rewriting. It comes across as saying (with the via cosntruct) that the Validation agent is a web proxy to the Identification Agent (which is the UA, typically). Note typically, in windows, the kernel intercepts the TLS invocation, performs the SSL handshake, and does NOT know that a given resource is being requested, of a webserver. There are strong implementation biases in this formulation that are improper, for a W3C spec. Generalize.
 
Im almost 18 years out of date on SSL spec, but I dont recall something called the " TLS client-certificate retrieval protocol." The actual protocols of the SSL state machine are well defined in the SSL3 and followup IETF specs. They have proper names. Be precise. Editor to help those for whom English is second language. Editor needs to do fact checking, and quality control.
 
step 2 of 3.1 is confusing. If makes it sound like if an unsolicited certificate message is sent voluntarily by client, before a certificate is sought by the server, it MUST be rejected by the server. Is this what is meant? Must any unsolicited presentation be rejected? If so what MUST happen? is there an SSL close?
 
If a TLS session is resumed, and a request received inducing the TLS entity to commence a complete SSL handshake and requesting a certificate, is it a WebID Protocol violation to use the client certificate from the earlier completed handshake? (this case is common.) MUST certificates be provided by the current handshake? If two handshakes on 1 TCP connection present diferent client certificates, what happens?
 
The phrase "claimed WebID URIs." should be removed, unless some doctrine about "claimed'ness is exaplained". 
 
4 says "must attempt to verify thepublic key information associated with at least one of the claimedWebID URIs. " it doesnt say HOW to verify the public key "information" (which I assume to mean "Public Key"). Is this an implemention specific method?
 
It doest make much sense, in technical English, to say the public key ...associated with at least one of... How do I tell that this one is, and that one is not? I think the text is trying to say: pick one by one of the URIs from the cert, and see if they verify against the graph (not that we have pulled it yet, in the step sequence). one must verify. Much tighter grade specification language is required, focussed on implementors writing state machines.
 
step 5 is clear, but makes wholly improper use of MUST.
 
remove common fallacy "verifies that theIdentification Agent owns the private key". There is no method know to do this (establish ownership). One can establish "control over." by the method of proving posession of.
 
Change" TLS mutual-authentication between the Verification Agent and theIdentification Agent." to TLS mutual-authentication between the Verification Agent and the User Agent" Identificiation Agents cannot perform TLS, let alone performs a mutually authentication service, let alone authenticate a server or a "Validation Agent". Browsers (ie. User Agents) can. Recall how the two were distinguished in the definition.
 
"If the Verification Agent does not have access to the TLS layer, a digital signature challenge must be provided by theVerification Agent. " This is  MUST, and is obviously critical. While there is a forward reference, I dont have a clue what this referring to. This is VERY WORRYING. I also dont know what "have access to the TLS layer" means. Does receiving client certs and information from the SSL server side socket constitute "access to". Very vague. Needs work. Never found the refernece, VERY CONFUSING INDEED. I probably would not hire a person saying this to me... note.; and would quickly wrap up the interview.
 
The next section is titled "Authentication Sequence Details", but was referenced as "Additional algorithms" earlier. Dont misuse the term algorithm, or qualify with "additional". From the title, its just more detailed steps in a sequence description.
 
3.2.1 and 3.2.2 are blank, and are implementation critical. Their absence makes the spec almost unimplementable. its only by luck that the current 15 implementations do the same thing, here, if that is indeed the case.
 
3.2.3 A Verification Agent must be able to process documents " has an inappropriate use of MUST. one never uses MUST in a MUST "be able". Learn to use SHOULD, MUST properly. This is elementary spec writing. On the topic, the statement contradicts earlier mandates concerning either/or requirements for validation agents handling RDFa, and/or RDF/XML. Previously, a Validation Agent was NOT required to always "be able to" process RDFa (for example my implementation will not). The text makes it sound that a Validation Agent that fails in its HTTP request to advertise the willingness to accept text/HTML would be non-conforming, for example.
 
On formalist grounds "Verifying the WebID is identified by that public key" does a public key even identify a WebID. a WebID is not a defined term. Only an WebID URI is defined. Since there are multiple public keys that associated with a WebID URI, what kind of multi-identification relation is being mandated here, to be then verified? I dont know.
 
VERY VERY WORRYING is the following : "against the one provided by the WebID Profile or another trusted source, ". This text suggests that a Validation Agent implementation may opt out of using a WebID Profile document,as dereferenced using a Claimed URI, and use some other source instead. To be literal, the MUST steps are to be performed, but then an implementation may ignore the result and just use something else. Kind of makes me want to research additional options... if true.
 
"9D ☮ 79 ☮ BF ☮ E2 ☮ F4 ☮ 98 ☮..."  has strange symbols (in my rendering). I dont know if to take it literally, or not, speaking as an implementor. Publish in PDF, if the web cannot render literals.
 
"3.2.6 Secure CommunicationThis section will explain how an Identification Agent and a Verification Agent may communicate securely using a set of verified identification credentials.If the Verification Agent has verified that theWebID Profile is owned by the Identification Agent, the Verification Agent should use the verifiedpublic key contained in the Identification Certificatefor all TLS-based communication with the Identification Agent."
 
it "will" or it "does".  First, one doesnt' "communicate security" using credentials, verified or otherwise. I have absolutely no idea what Im supposed to do for the folowing SHOULD. What is the scope of this SHOULD? The current web session? If the cert expires after 30m of activity on the wire, what am I supposed to do? keep using it? close the SSL session? we must use MUST SHOULD properly, and not make vacuous conformance statements.
 
 
3.3.1 goes on about topics not within the scope of the webid protocol (e.g. accounts, and foaf attributes). Remove, or label as non-normative. 
 
"The following propertiesshould be used when conveying cryptographic information inWebID Profile documents:"... Surely MUST. Surely, if someone choose the path of least resistance in a SHOULD clause, there will be NO interworking?
Received on Thursday, 17 November 2011 10:30:14 UTC