ACTION-156: Review of http://www.w3.org/2001/tag/doc/selfDescribingDocuments-2008-05-12.html from Williams, Stuart (HP Labs, Bristol) on 2008-09-02 (www-tag@w3.org from September 2008)

From: Williams, Stuart (HP Labs, Bristol) <skw@hp.com>
Date: Tue, 2 Sep 2008 18:02:02 +0000
To: "noah_mendelsohn@us.ibm.com" <noah_mendelsohn@us.ibm.com>
CC: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <233101CD2D78D64E8C6691E90030E5C818199EE984@GVW1120EXC.americas.hpqcorp.net>
Hello Noah,

At long last I have managed to take a review pass over http://www.w3.org/2001/tag/doc/selfDescribingDocuments-2008-05-12.html.

Broadly I think that the document reads well and is in a pretty mature state, however I do hae a few comments below.

Best regards

Stuart
--
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England
===============================================================================
--
Introduction: Bulletted list: ~4th item.

It might be worth mentioning revived use of the Link: http header as means to associate metadata with a resource (and indeed the use of <link> elements and/or http-equiv to induce http headers in a response in HTML)

--

2 The Web's Standard Retrieval Algorithm: 1st para (editorial)

Suggest changing:

        "Indeed there is a standard algorithm that a user agent can employ to obtain and interpret the representation..."

to
        "Indeed there is a standard algorithm that a user agent can employ to attempt to obtain and interpret the representation..."

Rationale: there is no certainty that application of the algorithm on a particular occasion will in fact obtain a representation or enable its intepretation by the particular client (the latter may still require a small matter of programming).

It would be really helpful if the diagram were of a size that would display/print conveniently.

--
Section 2 (editorial)

"When he clicks it, his browser:

- from the <code>http:</code> at the beginning of the URI, determines that the http scheme has been used - "

Suggest reversing these clauses:

ie.
"When he clicks it, his browser:

- determines that the http scheme has been used from the <code>http:</code> at the beginning of the URI  - "

--

Section 2 (substantive)

" - this tells the browser that a repesentation retrieved using the HTTP protocol is authoritative "

I don't think that the http: at the start of an HTTP URI does that. A 200 response accompanying a representation does either with respect to the request URI/host: combination in the corresponding HTTP request or wrt the URI given in a Content-Location: header accompanying the response, or wrt to both. [all modulo a level of trust in the proxy and caching infrastructure not to mis-represent the intent of the origin server and of course these days modulo DNS cache poisoning attacks].

--

Section 2 (substantive, minor)

" - looks up DNS name [DNS] example.com..."

Alternatively may lookup the DNS name of a configured proxy. The important point here being that the TCP connection in general may terminate in a different 'place' than that suggested by inspection of the URI.

--

Section 2 (substantive)

"Neither Bob nor his browser has any advance knowledge of the nature of the resource."

This usage of the word nature recurs and IMO is a little vague. I think that you are really talking about the media type of the representation in all cases rather than say the nature of a weather report as being a weather report, or new article as a new article - neither of which is particularly evident in the media-type when both are served up has HTML pages. Speaking of lack of prior knowledge of the nature of the resource gives an allusion to something way more sophisticated that lack of awarenetss/expectation about a the media-type of a response that is not borne out by the example in the narrative.

--

3 Widely deployed standards and formats: 3rd para (substantive)

In the example I would only take the position that "...there are no outright violations of Web architecture..." in the case where the media-type has been properly registered (and preferably documented (openly?)). I think that it would be worth mentioning media-type registration because the follow-your-nose chain breaks in case where this has not been done.

--

4.2 URIs based Extensibility (anal)

"...and in many cases each markup tag or data value used, is identified by a URI."

Absent SCUDs is that really the case? Maybe you are refering the to occurance of tag in a document marked with an ID such that the base URI of the document extended by a fragment ID corresponding to the ID value could be taken (via relevant media type spec) as naming that occurence of the use of the tag. Anyway I would quibble that it's not clear what you intended to say, and if for example you were trying to say that for example the html root element of an XHTML document has an associated identifying URI I'd struggle to know what it was - though I would willingly conceed that it has a URI based identifier in the form of an extended name (modulo elements, attributes, substitution groups... being distinct naming partitions).

--

4.2 and subsection (General)

Feels like there ought to be a few GPNs here capturing partial conclusions.

--

4.2.2 Microformats: (Question of information)

"Unlike... . The hCard profile specifies a value for the profile attribute..."

Is this particular idiom for the us of the profile attribute actually grounded in an HTML specification? Some of them? all of them that define the attribute?

I believe that the profile attribute was and maybe still is under treat in HTML5.

--

4.2.3 Self-describing XML documents (editorial) 3rd para:

Mentions the TAG nsDocument-8 finding which has matured beyond the state described in this document.

--

5 RDF and the Self-Describing Semantic-Web: 2nd para:

"Indeed RDF Schema and OWL Ontology technologies together offer a standard, machine-processable means of describing particular uses of RDF"

Hmmmm.... well they provide the means to describe entailments/inferences that can be drawn from a collection of RDF statement and to detect when a collection of RDF statements is inconsistent with respect to the axioms of a Schema/Ontology (and indeed when class defns within an ontology are inconsistent). So... in a very specialised way, I agree, but read as written I think that "...machine processable means of describing use of RDF" suggests a much broader capability.

--
Section 5 3rd para: (anal)

"... to obtain RDF triples that represent or describe the referenced resource."

This is potentially deep in the heart of httpRange territory (or not) depending on how closely one is reading.

Given a URI u (say for the planet mars) it is not ok by Web architecture to provide a direct 200 response and a descriptive representation of Mars. However it is ok to redirect to a descriptive resource whose representation contain a description of the resource reference by u.

You probably didn't intend the 'or' in the quoted fragment to be read that closely.

--
Section 5 RDF source fragment (editorial)

RDF/XML is pretty ugly to read compared to N3 which conveys a much clearer impression of the corresponding RDF graph:

@prefix employeeData:  <http://example.org/EmployeeInformation#> .
@prefix rdf:           <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://example.org/Employees#BobSmith>
      a                  employeeData:employee ;
      employeeData:email <mailto:BobSmith@example.org> ;
      employeeData:name "Bob Smith" .

Unless it is really important to use RDF/XML to make the point I'd suggest replacing with the N3 above.


--
Received on Tuesday, 2 September 2008 18:05:30 UTC