RE: ACTION-156: Review of http://www.w3.org/2001/tag/doc/selfDescribingDocuments-2008-05-12.html from Williams, Stuart (HP Labs, Bristol) on 2008-09-03 (www-tag@w3.org from September 2008)

From: Williams, Stuart (HP Labs, Bristol) <skw@hp.com>
Date: Wed, 3 Sep 2008 11:11:04 +0000
To: "noah_mendelsohn@us.ibm.com" <noah_mendelsohn@us.ibm.com>
CC: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <233101CD2D78D64E8C6691E90030E5C818199EECA9@GVW1120EXC.americas.hpqcorp.net>
Hello Noah,

A few responses in line... but you might like to visit the tail first...

> -----Original Message-----
> From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com]
> Sent: 03 September 2008 02:27
> To: Williams, Stuart (HP Labs, Bristol)
> Cc: www-tag@w3.org
> Subject: Re: ACTION-156: Review of
> http://www.w3.org/2001/tag/doc/selfDescribingDocuments-2008-05-12.html
>
> Stuart Williams writes:
>
> > At long last I have managed to take a review pass over http://www.
> > w3.org/2001/tag/doc/selfDescribingDocuments-2008-05-12.html.
> >
> > Broadly I think that the document reads well and is in a
> pretty mature
> state
>
> Thank you!
>
> > however I do hae a few comments below.
>
> OK.  Individual responses below.  I have put a quick and dirty revision up
> on the Web at
> http://www.w3.org/2001/tag/doc/selfDescribingDocuments-2008-08
> -22.html.  I
> say quick and dirty because the revision date in the text is 2 Sept, but
> the file name in date space is still the August date of my last editors
> copy, etc.  When I say below that something is fixed or "DONE", you should
> be able to check it in this draft.  After we close on getting the
> revisions done, I will publish a new and more stable copy for review at
> the face to face under a different URI.  I'd love to do that by sometime
> next week, which should give TAG members enough reading time to support a
> decision on publication in KC.
>
> > Introduction: Bulletted list: ~4th item.
> >
> > It might be worth mentioning revived use of the Link: http header as
> > means to associate metadata with a resource (and indeed the use of
> > <link> elements and/or http-equiv to induce http headers in a
> > response in HTML)
>
> Changed to read:
> "For integration with the Semantic Web, self-describing representations
> should convey RDF triples, either directly in the representation, by
> linking to the triples (perhaps using <link> elements in HTML or the link:
> header in HTTP), or by linking to transformations using technologies such
> as GRDDL. "
>
> I think the http-equiv is one step to far for an introduction.  I'm not
> trying to be complete, just suggestive.

Ok.

> --
>
> > 2 The Web's Standard Retrieval Algorithm: 1st para (editorial)
> >
> > Suggest changing:
> >
> >         "Indeed there is a standard algorithm that a user agent can
> > employ to obtain and interpret the representation..."
> >
> > to
> >         "Indeed there is a standard algorithm that a user agent can
> > employ to attempt to obtain and interpret the representation..."
> >
> > Rationale: there is no certainty that application of the algorithm
> > on a particular occasion will in fact obtain a representation or
> > enable its intepretation by the particular client (the latter may
> > still require a small matter of programming).
>
> DONE though I think the original was clear enough and shorter; this isn't
> a specification, it's a finding trying to give people a sense of the
> issues and of good practice.

Thanks...

> > It would be really helpful if the diagram were of a size that would
> > display/print conveniently.
>
> I infer you jumped to the diagram in the appendix.  With help from Norm,
> the sizing should be fixed.

Thanks...

> --
> > Section 2 (editorial)
> >
> > "When he clicks it, his browser:
> >
> > - from the <code>http:</code> at the beginning of the URI,
> > determines that the http scheme has been used - "
> >
> > Suggest reversing these clauses:
> >
> > ie.
> > "When he clicks it, his browser:
> >
> > - determines that the http scheme has been used from the <code>http:
> > </code> at the beginning of the URI  - "
>
> I'm not happy with that because it parses ambiguously on first reading.

FWIW on first reading by this reviewer the flow:

"When he clicks it, his browser: from the http: at the beginning of the URI determines that the http scheme has been used"

failed to parse due I think to the large separation between the subject (his browser) and the verb (determines). It took me three readings to make sense of it. Ok. I'm just one data point.

> In
> addition to the intended reading, you can try to scan as "the
> scheme has
> been used from the http:".  Note changed/
>
> --
>
> > Section 2 (substantive)
> >
> > " - this tells the browser that a repesentation retrieved using the
> > HTTP protocol is authoritative "
> >
> > I don't think that the http: at the start of an HTTP URI does that.
> > A 200 response accompanying a representation does either with
> > respect to the request URI/host: combination in the corresponding
> > HTTP request or wrt the URI given in a Content-Location: header
> > accompanying the response, or wrt to both. [all modulo a level of
> > trust in the proxy and caching infrastructure not to mis-represent
> > the intent of the origin server and of course these days modulo DNS
> > cache poisoning attacks].
>
> Again, I'm a little concerned that if we try to cross all the T's and dot
> I's, as we would in the HTTP RFC itself, we will just cut into the
> readability of the document without substantially improving its accuracy
> or impact.  I honestly think that, in context, the intention is clear.
>
> More to your point:  I do think the http: at the beginning plays a
> significant role in establishing that the retrieved representation, if
> any, is authoritative.  In contrast, if I have a URI using the https
> scheme, and if I make what is generally the mistake of attempting to
> retrieve a representation using ordinary (no SSL) HTTP, then a
> representation that comes back is not authoritative, even if
> the server is willing to provide one.   As I understand it, part of the
> contract for a URI employing the https scheme is that responses are considered
> authoritative only if HTTPS is used.

Interesting... thats coming at it from quite a different angle from what I read into the simple bullet line. I would not have guessed that.

The appeal to the authoritativeness of the representation brought to mind the Authoritative Metadata finding "http://www.w3.org/2001/tag/doc/mime-respect-20060412". I had not considered at all (I'm not sure that the TAG has discussed - hmmm schemeProtocol maybe - access attempts using protocol different from that implied by the scheme).

The other sensitivity that I have is that I quiet ok we a successful retrieved representation as being an authoritative representation of something... but of what - the thing they asked for, or something else. Folks some times don't see the intervening steps in a redirection (that API logic may hide or obfiscate) as signifying that the response they have is to a different question.

Anyway... now I understand your point as being that a representation (if any) retrieved using the protocol 'associated' [*] with a URIs scheme is (in some sense) an authoritative representation of the referenced resource.

[*] scare quote, because how schemes and protocols are tied together remains a little ellusive (at least for me).

> In any case, I'd rather not get into the complications you list.  I
> certainly don't see a need to go into Content-location, for example, since
> the paragraph above indicates that we're talking about a "typical path"
> through an HTTP retrieval, not an exhaustive exploration of the options.
> This example is intended to get people thinking about "follow your nose"
> in the context of a typical retrieval.  If there are errors in it, we
> should fix them, but I'd rather avoid trying to restate all
> of RFC 2616
> here.

> --
>
> > Section 2 (substantive, minor)
> >
> > " - looks up DNS name [DNS] example.com..."
> >
> > Alternatively may lookup the DNS name of a configured proxy. The
> > important point here being that the TCP connection in general may
> > terminate in a different 'place' than that suggested by
> inspection of
> the URI.
>
> Again, this is advertised as an illustration of a "typical path" through
> the standard retrieval algorithm of the Web.  Proxies are indeed one
> possible complication, but do we need to mention them in our first
> introduction to "follow your nose"?

I guess that depends on who we might expect to play our own words back to us and with what force.


> --
>
> > Section 2 (substantive)
> >
> > "Neither Bob nor his browser has any advance knowledge of the nature
> > of the resource."
> >
> > This usage of the word nature recurs and IMO is a little vague. I
> > think that you are really talking about the media type of the
> > representation in all cases rather than say the nature of a weather
> > report as being a weather report, or new article as a new article
>
> No!  We're potentially talking about all of that.  In fact, the more
> powerful the self-description technology you use, the more you're likely
> to be able to discover by following your nose.  Yes, at very least HTTP
> almost ensures that you will discover the media type, but often you can do
> much more.  The media type might be application/xml, but the
> namespace-qualified root element might tell you quite reliably that you're
> looking at a resume or a work of music or an inventory report.  Similarly
> with RDF.  So, the power of a Self-describing Web is that in many cases
> you can indeed follow a link, with no a priori knowledge of the nature of
> the resource, and discover as you put it the "nature of a weather report
> as being a weather report".

Well ok, I can see that we could be talking about "all of that"... but the example given really doesn't establish the notion of nature as anything more than establishing media-type and thence encoding format - and pragmatically what handler to pass the presentation of the representation off to. It has done little to establish the nature of the resource (as opposed to the 'nature' of the retrieved representation).

> > - neither of which is particularly evident in the media-type when
> > both are served up has HTML pages. Speaking of lack of prior
> > knowledge of the nature of the resource gives an allusion to
> > something way more sophisticated that lack of awarenetss/expectation
> > about a the media-type of a response that is not borne out by the
> > example in the narrative.
>
> You're certainly right that we have not established in this particular
> example that a machine could automatically discover the semantics conveyed
> by an HTML page, but in this example there is a human user.

Ok... the sentence does say "Neither Bob nor his browser" so I will conceed that Bob at least has been able to determine the nature of the resource - but I think I would contend that is primarally due to his ability to interpret a rendering of natural language that is largely opaque to the techology in this example.

>  I believe
> that, from a commonsense point of view, we have established that Bob can
> click on pretty much any link and his browser (if it's a good one) will
> show him a page such as a weather report that he as a human has a good
> shot at recognizing, or in the case of an image/jpeg a picture, or failing
> that the browser will reliably say to Bob:  I don't know what to do with
> this one.  In short, I believe that readers will understand that Bob can
> click an arbitrary link and, in practice, realize that what he's got is a
> weather report.

Ok... but I don't think I have seen any artifact of self-description that has genuinely contributed to that determination other than the natural language content of the weather report. It would be no different for weather report printed in a news paper or broadcast on TV.

> Note that later parts of the document do discuss the need for more
> application-specific content standards in the case where machines or
> software are supposed to extract semantics automatically.

Aside: I know I'm giving this a skeptical reading. "nature" was a trigger word from me because IMO it suggests much more than the example delivers.

> --
>
> > 3 Widely deployed standards and formats: 3rd para (substantive)
> >
> > In the example I would only take the position that "...there are no
> > outright violations of Web architecture..." in the case where the
> > media-type has been properly registered
>
> Well, the example media type I used was image/x-fancyrawphotoformat, and
> media types starting with x- are experimental and in fact cannot be
> registered.  Their use is certainly discouraged, by I don't think that
> using a media type like this is a violation of Web architecture.  In fact,
> the whole point of this example is that use of such a media type is bad
> practice, but not "an outright violation of the architecture".  You
> haven't convinced me that isn't true.

Well, I'd be wary of the TAG suggesting deployment of such a media-type outside of a prototypical situation was kind-of Ok. But I'll accept your argument that it is not "and outright violation of Web architecture".

> > (and preferably documented
> > (openly?)). I think that it would be worth mentioning media-type
> > registration because the follow-your-nose chain breaks in case where
> > this has not been done.
>
> The very next sentences say:
>
> "No existing Web user agents recognize the image/x-fancyrawphotoformat
> media type, search engine spiders are unlikely to extract useful
> information from pictures in that format, and so on. Unlike Susan's, which
> can be viewed by almost anyone, Mary's photos are at best useful to a few
> people who have the proprietary software needed to decode them. "
>
> So you're right: nobody's advertising follow your nose here; it's a
> counter example.  Use media types that nobody knows (nose?) about, or that
> aren't documented, and the follow your nose story loses a lot of its
> value.  I'm not yet convinced that the story would be improved by changing
> it.
>
> --
>
> 4.2 URIs based Extensibility (anal)
>
> "...and in many cases each markup tag or data value used, is
> identified by
> a URI."
>
> > Absent SCUDs is that really the case? Maybe you are refering the to
> > occurance of tag in a document marked with an ID such that the base
> > URI of the document extended by a fragment ID corresponding to the
> > ID value could be taken (via relevant media type spec) as naming
> > that occurence of the use of the tag. Anyway I would quibble that
> > it's not clear what you intended to say, and if for example you were
> > trying to say that for example the html root element of an XHTML
> > document has an associated identifying URI I'd struggle to know what
> > it was - though I would willingly conceed that it has a URI based
> > identifier in the form of an extended name (modulo elements,
> > attributes, substitution groups... being distinct naming
> partitions).
>
> Our own Namespaces Document finding says (
> http://www.w3.org/2001/tag/doc/nsDocuments/#div.fragid) "For many
> applications of namespaces, it's valuable not only to be able
> to point to
> the namespace as a whole, but also to be able to point to terms within
> that namespace."   I think we all know of cases, such as the Atom use
> cased discussed in the self-desc. draft, in which each data value is a
> URI.  It seems to me that the statement you quote is pretty well
> justified.

I did mark the comment as 'anal'. And I accept that in Atom and RDF for example, many if not all values have or are URI.

But URI identifiers for elements/attributes in a vocabulary or occurences (tags) in a document using a vocabulary are (or have been problematic) - hmmm... ok XPATH maybe gets us there in some cases.

> --
>
> > 4.2 and subsection (General)
> >
> > Feels like there ought to be a few GPNs here capturing partial
> conclusions.
>
> I'm open to suggestions, but I'd rather not hold up publication if we
> can't come up with any.
>
> --
>
> > 4.2.2 Microformats: (Question of information)
> >
> > "Unlike... . The hCard profile specifies a value for the profile
> attribute..."
> >
> > Is this particular idiom for the us of the profile attribute
> > actually grounded in an HTML specification?
>
> I believe that Dan, among others, has led me to believe that the answer is
> yes.  I don't consider myself an expert on those aspects of HTML.
>
> > Some of them? all of them that define the attribute?
>
> Some of what?

HTML languages specifications.

>  Do you mean some of the microformats?  That
> doesn't make
> sense because a few sentences later I say:
> "Unfortunately, few microformats have such profiles, and even when
> profiles are available, evidence suggests that they are not
> universally
> applied. "  So, I'm afraid I'm misunderstanding your phrases "some of
> them" and "all of them".
>
> > I believe that the profile attribute was and maybe still is under
> > treat in HTML5.
>
> Yes.  I think there has also recently been consideration of some
> mechanisms that are similar in spirit but different in detail.
>
> Are you suggesting that should change the draft finding, and if so how
> would you suggest?  I think the approach we've been taking is
> to discuss the pros and cons of having this facility on the merits.  If
> HTML5 decides eventually to ship without a facility that we have deemed
> valuable, then either we're wrong or they've missed an opportunity.  We can
> always revise the finding were that to happen, to warn people that our good advice
> cannot in fact be followed.

I was asking a question because I didn't know the answer. Certainly there is some evidence of the use of this idiom for using the profile attribute. I was simply asking whether this was justified by some, none or all of the relevant HTML specs. That's me trying to anchor the follow your nose chain in a spec, cited from a media type that justifies confidence in using the profile attribute in the way described. I raise the question and will take your response at face value - I haven't checked myself.

> --
>
> > 4.2.3 Self-describing XML documents (editorial) 3rd para:
> >
> > Mentions the TAG nsDocument-8 finding which has matured beyond the
> > state described in this document.
>
> Fixed.  The text now reads:
>
> "The TAG Finding "Associating Resources with Namespaces"
> [NamespaceDocuments], recommends the use of [RDDL] as a
> preferred means of documenting namespaces."
>
> The bibliography has also been updated to point to the
> published finding.

Thanks.

> --
>
> > 5 RDF and the Self-Describing Semantic-Web: 2nd para:
> >
> > "Indeed RDF Schema and OWL Ontology technologies together offer a
> > standard, machine-processable means of describing particular uses of
> RDF"
> >
> > Hmmmm.... well they provide the means to describe
> > entailments/inferences that can be drawn from a collection of RDF
> > statement and to detect when a collection of RDF statements is
> > inconsistent with respect to the axioms of a Schema/Ontology (and
> > indeed when class defns within an ontology are inconsistent). So...
> > in a very specialised way, I agree, but read as written I think that
> > "...machine processable means of describing use of RDF" suggests a
> > much broader capability.
>
> How about, "offer a machine-processable means of extracting information
> from particular uses of RDF"?  I'm open to better suggestions.  No change
> made for the moment.

I guess what it is that I'm wary of (and this bears on the nature comment) is allusion to things that have magical quality - and to some extend if you think deeply about what is being said, that' what it would take - magic.

I can give a less critical reading and understand the spirit of what I think you're trying to say - and saying it accurately and with precison and simply is very hard.

In the case of RDFS and OWL "...they offer facilities for detect inconsistencies in what is being said and to make some consistent inferences from what has been said eg. in some cases deducing that two URI refer to the same thing and thence whatever follows from that deduction, or that two URI could not possibly be refering (consistently) to the same thing, or that an assertion of equivalence is inconsistent with the facts... and so forth..." but this is not introductory language.

> --
> > Section 5 3rd para: (anal)
> >
> > "... to obtain RDF triples that represent or describe the
> referencedresource."
> >
> > This is potentially deep in the heart of httpRange territory (or
> > not) depending on how closely one is reading.
> >
> > Given a URI u (say for the planet mars) it is not ok by Web
> > architecture to provide a direct 200 response and a descriptive
> > representation of Mars. However it is ok to redirect to a
> > descriptive resource whose representation contain a description of
> > the resource reference by u.
> >
> > You probably didn't intend the 'or' in the quoted fragment to be
> > read that closely.
>
> Yes, I agree with your analysis, but at the end you seem to waffle an
> whether you are asking for a change or making an observation.
> Suggestions?

"...to directly obtain RDF triples that represent or indirectly obtain RDF triples that describe the referenced resource."

..though I can live with the original though I can read it two ways only one of which I think is correct.

>
> --
> > Section 5 RDF source fragment (editorial)
> >
> > RDF/XML is pretty ugly to read compared to N3 which conveys a much
> > clearer impression of the corresponding RDF graph:
> >
> > @prefix employeeData:  <http://example.org/EmployeeInformation#> .
> > @prefix rdf:
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> >
> > <http://example.org/Employees#BobSmith>
> >       a                  employeeData:employee ;
> >       employeeData:email <mailto:BobSmith@example.org> ;
> >       employeeData:name "Bob Smith" .
> >
> > Unless it is really important to use RDF/XML to make the point I'd
> > suggest replacing with the N3 above.
>
>
> Let's see what others think.  I certainly take your point.  The reason I'm
> a bit hesitant is that I'm among the many readers who already knows XML
> very well, and RDF just a little.  Keeping in mind that the point is not
> to rigorously teach RDF, but to give one a sense of how the retrieved
> ontology might teach you that email could be sent, the XML is easier to
> get through for readers like me.  Unless you know N3, that
> free floating "
>       a     " on the line under <http: > is very confusing.  On the
> contrary, readers who come from a Semantic Web world will have no trouble
> at all with the N3, many other readers will guess right if they stare hard
> enough, so there's certainly merit to your suggestion.

Ok... the free floating 'a' could be replaced with 'rdf:type'. I suspect most people struggle with RDF/XML syntax.... I know that I still do.

> Bottom line: I'd like to hear from other TAG members on this one.
>
> Thank you again for the very careful reading and the thoughtful comments.
> What's your feeling about the likelihood that we can resolve
> these issues in time to publish at the F2F?  Thank you.

It's certainly not my intention to obstruct progress and in principle I have no objection to publishing a document. I'm conscious that I gave it a critical read, if some of my more picky comments don't reasonate or find sympathies (particularly with you, though other TAG members too)... that's ok.

> Noah
>
> --
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------


Best regards

Stuart
--
Received on Wednesday, 3 September 2008 11:14:30 UTC