- From: <noah_mendelsohn@us.ibm.com>
- Date: Tue, 2 Sep 2008 21:27:01 -0400
- To: "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>
- Cc: "www-tag@w3.org" <www-tag@w3.org>
Stuart Williams writes: > At long last I have managed to take a review pass over http://www. > w3.org/2001/tag/doc/selfDescribingDocuments-2008-05-12.html. > > Broadly I think that the document reads well and is in a pretty mature state Thank you! > however I do hae a few comments below. OK. Individual responses below. I have put a quick and dirty revision up on the Web at http://www.w3.org/2001/tag/doc/selfDescribingDocuments-2008-08-22.html. I say quick and dirty because the revision date in the text is 2 Sept, but the file name in date space is still the August date of my last editors copy, etc. When I say below that something is fixed or "DONE", you should be able to check it in this draft. After we close on getting the revisions done, I will publish a new and more stable copy for review at the face to face under a different URI. I'd love to do that by sometime next week, which should give TAG members enough reading time to support a decision on publication in KC. > Introduction: Bulletted list: ~4th item. > > It might be worth mentioning revived use of the Link: http header as > means to associate metadata with a resource (and indeed the use of > <link> elements and/or http-equiv to induce http headers in a > response in HTML) Changed to read: "For integration with the Semantic Web, self-describing representations should convey RDF triples, either directly in the representation, by linking to the triples (perhaps using <link> elements in HTML or the link: header in HTTP), or by linking to transformations using technologies such as GRDDL. " I think the http-equiv is one step to far for an introduction. I'm not trying to be complete, just suggestive. -- > 2 The Web's Standard Retrieval Algorithm: 1st para (editorial) > > Suggest changing: > > "Indeed there is a standard algorithm that a user agent can > employ to obtain and interpret the representation..." > > to > "Indeed there is a standard algorithm that a user agent can > employ to attempt to obtain and interpret the representation..." > > Rationale: there is no certainty that application of the algorithm > on a particular occasion will in fact obtain a representation or > enable its intepretation by the particular client (the latter may > still require a small matter of programming). DONE though I think the original was clear enough and shorter; this isn't a specification, it's a finding trying to give people a sense of the issues and of good practice. > It would be really helpful if the diagram were of a size that would > display/print conveniently. I infer you jumped to the diagram in the appendix. With help from Norm, the sizing should be fixed. -- > Section 2 (editorial) > > "When he clicks it, his browser: > > - from the <code>http:</code> at the beginning of the URI, > determines that the http scheme has been used - " > > Suggest reversing these clauses: > > ie. > "When he clicks it, his browser: > > - determines that the http scheme has been used from the <code>http: > </code> at the beginning of the URI - " I'm not happy with that because it parses ambiguously on first reading. In addition to the intended reading, you can try to scan as "the scheme has been used from the http:". Note changed/ -- > Section 2 (substantive) > > " - this tells the browser that a repesentation retrieved using the > HTTP protocol is authoritative " > > I don't think that the http: at the start of an HTTP URI does that. > A 200 response accompanying a representation does either with > respect to the request URI/host: combination in the corresponding > HTTP request or wrt the URI given in a Content-Location: header > accompanying the response, or wrt to both. [all modulo a level of > trust in the proxy and caching infrastructure not to mis-represent > the intent of the origin server and of course these days modulo DNS > cache poisoning attacks]. Again, I'm a little concerned that if we try to cross all the T's and dot I's, as we would in the HTTP RFC itself, we will just cut into the readability of the document without substantially improving its accuracy or impact. I honestly think that, in context, the intention is clear. More to your point: I do think the http: at the beginning plays a significant role in establishing that the retrieved representation, if any, is authoritative. In contrast, if I have a URI using the https scheme, and if I make what is generally the mistake of attempting to retrieve a representation using ordinary (no SSL) HTTP, then a representation that comes back is not authoritative, even if the server is willing to provide one. As I understand it, part of the contract for a URI employing the https scheme is that responses are considered authoritative only if HTTPS is used. In any case, I'd rather not get into the complications you list. I certainly don't see a need to go into Content-location, for example, since the paragraph above indicates that we're talking about a "typical path" through an HTTP retrieval, not an exhaustive exploration of the options. This example is intended to get people thinking about "follow your nose" in the context of a typical retrieval. If there are errors in it, we should fix them, but I'd rather avoid trying to restate all of RFC 2616 here. -- > Section 2 (substantive, minor) > > " - looks up DNS name [DNS] example.com..." > > Alternatively may lookup the DNS name of a configured proxy. The > important point here being that the TCP connection in general may > terminate in a different 'place' than that suggested by inspection of the URI. Again, this is advertised as an illustration of a "typical path" through the standard retrieval algorithm of the Web. Proxies are indeed one possible complication, but do we need to mention them in our first introduction to "follow your nose"? -- > Section 2 (substantive) > > "Neither Bob nor his browser has any advance knowledge of the nature > of the resource." > > This usage of the word nature recurs and IMO is a little vague. I > think that you are really talking about the media type of the > representation in all cases rather than say the nature of a weather > report as being a weather report, or new article as a new article No! We're potentially talking about all of that. In fact, the more powerful the self-description technology you use, the more you're likely to be able to discover by following your nose. Yes, at very least HTTP almost ensures that you will discover the media type, but often you can do much more. The media type might be application/xml, but the namespace-qualified root element might tell you quite reliably that you're looking at a resume or a work of music or an inventory report. Similarly with RDF. So, the power of a Self-describing Web is that in many cases you can indeed follow a link, with no a priori knowledge of the nature of the resource, and discover as you put it the "nature of a weather report as being a weather report". > - neither of which is particularly evident in the media-type when > both are served up has HTML pages. Speaking of lack of prior > knowledge of the nature of the resource gives an allusion to > something way more sophisticated that lack of awarenetss/expectation > about a the media-type of a response that is not borne out by the > example in the narrative. You're certainly right that we have not established in this particular example that a machine could automatically discover the semantics conveyed by an HTML page, but in this example there is a human user. I believe that, from a commonsense point of view, we have established that Bob can click on pretty much any link and his browser (if it's a good one) will show him a page such as a weather report that he as a human has a good shot at recognizing, or in the case of an image/jpeg a picture, or failing that the browser will reliably say to Bob: I don't know what to do with this one. In short, I believe that readers will understand that Bob can click an arbitrary link and, in practice, realize that what he's got is a weather report. Note that later parts of the document do discuss the need for more application-specific content standards in the case where machines or software are supposed to extract semantics automatically. -- > 3 Widely deployed standards and formats: 3rd para (substantive) > > In the example I would only take the position that "...there are no > outright violations of Web architecture..." in the case where the > media-type has been properly registered Well, the example media type I used was image/x-fancyrawphotoformat, and media types starting with x- are experimental and in fact cannot be registered. Their use is certainly discouraged, by I don't think that using a media type like this is a violation of Web architecture. In fact, the whole point of this example is that use of such a media type is bad practice, but not "an outright violation of the architecture". You haven't convinced me that isn't true. > (and preferably documented > (openly?)). I think that it would be worth mentioning media-type > registration because the follow-your-nose chain breaks in case where > this has not been done. The very next sentences say: "No existing Web user agents recognize the image/x-fancyrawphotoformat media type, search engine spiders are unlikely to extract useful information from pictures in that format, and so on. Unlike Susan's, which can be viewed by almost anyone, Mary's photos are at best useful to a few people who have the proprietary software needed to decode them. " So you're right: nobody's advertising follow your nose here; it's a counter example. Use media types that nobody knows (nose?) about, or that aren't documented, and the follow your nose story loses a lot of its value. I'm not yet convinced that the story would be improved by changing it. -- 4.2 URIs based Extensibility (anal) "...and in many cases each markup tag or data value used, is identified by a URI." > Absent SCUDs is that really the case? Maybe you are refering the to > occurance of tag in a document marked with an ID such that the base > URI of the document extended by a fragment ID corresponding to the > ID value could be taken (via relevant media type spec) as naming > that occurence of the use of the tag. Anyway I would quibble that > it's not clear what you intended to say, and if for example you were > trying to say that for example the html root element of an XHTML > document has an associated identifying URI I'd struggle to know what > it was - though I would willingly conceed that it has a URI based > identifier in the form of an extended name (modulo elements, > attributes, substitution groups... being distinct naming partitions). Our own Namespaces Document finding says ( http://www.w3.org/2001/tag/doc/nsDocuments/#div.fragid) "For many applications of namespaces, it's valuable not only to be able to point to the namespace as a whole, but also to be able to point to terms within that namespace." I think we all know of cases, such as the Atom use cased discussed in the self-desc. draft, in which each data value is a URI. It seems to me that the statement you quote is pretty well justified. -- > 4.2 and subsection (General) > > Feels like there ought to be a few GPNs here capturing partial conclusions. I'm open to suggestions, but I'd rather not hold up publication if we can't come up with any. -- > 4.2.2 Microformats: (Question of information) > > "Unlike... . The hCard profile specifies a value for the profile attribute..." > > Is this particular idiom for the us of the profile attribute > actually grounded in an HTML specification? I believe that Dan, among others, has led me to believe that the answer is yes. I don't consider myself an expert on those aspects of HTML. > Some of them? all of them that define the attribute? Some of what? Do you mean some of the microformats? That doesn't make sense because a few sentences later I say: "Unfortunately, few microformats have such profiles, and even when profiles are available, evidence suggests that they are not universally applied. " So, I'm afraid I'm misunderstanding your phrases "some of them" and "all of them". > I believe that the profile attribute was and maybe still is under > treat in HTML5. Yes. I think there has also recently been consideration of some mechanisms that are similar in spirit but different in detail. Are you suggesting that should change the draft finding, and if so how would you suggest? I think the approach we've been taking is to discuss the pros and cons of having this facility on the merits. If HTML5 decides eventually to ship without a facility that we have deemed valuable, then either we're wrong or they've missed an opportunity. We can always revise the finding were that to happen, to warn people that our good advice cannot in fact be followed. -- > 4.2.3 Self-describing XML documents (editorial) 3rd para: > > Mentions the TAG nsDocument-8 finding which has matured beyond the > state described in this document. Fixed. The text now reads: "The TAG Finding "Associating Resources with Namespaces" [NamespaceDocuments], recommends the use of [RDDL] as a preferred means of documenting namespaces." The bibliography has also been updated to point to the published finding. -- > 5 RDF and the Self-Describing Semantic-Web: 2nd para: > > "Indeed RDF Schema and OWL Ontology technologies together offer a > standard, machine-processable means of describing particular uses of RDF" > > Hmmmm.... well they provide the means to describe > entailments/inferences that can be drawn from a collection of RDF > statement and to detect when a collection of RDF statements is > inconsistent with respect to the axioms of a Schema/Ontology (and > indeed when class defns within an ontology are inconsistent). So... > in a very specialised way, I agree, but read as written I think that > "...machine processable means of describing use of RDF" suggests a > much broader capability. How about, "offer a machine-processable means of extracting information from particular uses of RDF"? I'm open to better suggestions. No change made for the moment. -- > Section 5 3rd para: (anal) > > "... to obtain RDF triples that represent or describe the referencedresource." > > This is potentially deep in the heart of httpRange territory (or > not) depending on how closely one is reading. > > Given a URI u (say for the planet mars) it is not ok by Web > architecture to provide a direct 200 response and a descriptive > representation of Mars. However it is ok to redirect to a > descriptive resource whose representation contain a description of > the resource reference by u. > > You probably didn't intend the 'or' in the quoted fragment to be > read that closely. Yes, I agree with your analysis, but at the end you seem to waffle an whether you are asking for a change or making an observation. Suggestions? -- > Section 5 RDF source fragment (editorial) > > RDF/XML is pretty ugly to read compared to N3 which conveys a much > clearer impression of the corresponding RDF graph: > > @prefix employeeData: <http://example.org/EmployeeInformation#> . > @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . > > <http://example.org/Employees#BobSmith> > a employeeData:employee ; > employeeData:email <mailto:BobSmith@example.org> ; > employeeData:name "Bob Smith" . > > Unless it is really important to use RDF/XML to make the point I'd > suggest replacing with the N3 above. Let's see what others think. I certainly take your point. The reason I'm a bit hesitant is that I'm among the many readers who already knows XML very well, and RDF just a little. Keeping in mind that the point is not to rigorously teach RDF, but to give one a sense of how the retrieved ontology might teach you that email could be sent, the XML is easier to get through for readers like me. Unless you know N3, that free floating " a " on the line under <http: > is very confusing. On the contrary, readers who come from a Semantic Web world will have no trouble at all with the N3, many other readers will guess right if they stare hard enough, so there's certainly merit to your suggestion. Bottom line: I'd like to hear from other TAG members on this one. Thank you again for the very careful reading and the thoughtful comments. What's your feeling about the likelihood that we can resolve these issues in time to publish at the F2F? Thank you. Noah -- -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 --------------------------------------
Received on Wednesday, 3 September 2008 01:26:18 UTC