- From: David Booth <david@dbooth.org>
- Date: Thu, 29 Mar 2012 13:11:57 -0400
- To: Jeni Tennison <jeni@jenitennison.com>
- Cc: public-lod community <public-lod@w3.org>
Hi Jeni, On Wed, 2012-03-28 at 18:01 +0100, Jeni Tennison wrote: > > http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol [ . . . ] > 1. The focus on the *definition* of a URI as opposed to a mere > description is problematic for me. There are lots of things in the > world that couldn't be adequately *defined* but can be described to > more or less detail. I worry that people will get tied up in knots > trying to work out what a definition looks like for a Person or a > Book. Although I prefer most of the language in your draft, I prefer > the looser 'description' used in Jonathan's document. That sounds like an important concern, but I think it is best to separate the issue of how we educate the public about how this works, from figuring out the engineering of how it works. We first need to deal with the engineering. If you notice the definition of "URI documentation" in the "baseline" document http://www.w3.org/2001/tag/doc/uddp-20120229/ it says: "URI documentation is information that documents the intended meaning of a particular probe URI." That's what a definition is. So the terminology in the UDDP proposal http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol#2.4_URI_definition.2C_explicit_URI_definition_and_implicit_URI_definition is merely calling a spade a "spade". Furthermore, there is an important distinction between a definition and any other documentation or description. A definition *is* documentation (or a description) but not every piece of documentation (or not every description) is a definition. This key difference tends to get blurred when a definition is blandly called "documentation" or "a description". I suspect that some have been wary of recognizing this distinction out of a concern that if something is called a definition, then a client will be obligated to use that definition, and that would unreasonably constrain the client. But this concern is unfounded if the specification makes clear that a client is free to do whatever it wishes with a URI definition that it retrieves. Finally, to give a little more insight about what it means to provide a URI definition for something such as a Person or a Book, in some sense the URI definition does not actually *define* that person or book. Rather, it defines the *binding* of the URI (as a name) to a particular description of that thing, which indirectly (partially) identifies it. And as Pat Hayes (and others) have pointed out many times, there is inherent ambiguity in virtually any description. This means that a URI definition does not *fully* determine the thing that the URI is supposed to identify. That is both a plus and a minus. It is a minus because it means that in general others can never know *exactly* what that URI owner intended it to identify, and this leads to downstream inconsistencies, as illustrated Part 2 of "Resource Identity and Semantic Extensions: Making Sense of Ambiguity": http://dbooth.org/2010/ambiguity/paper.html#inconsistent-merge On the other hand ambiguity is also a plus because it means that the URI can be used in a much wider variety of contexts, such as the URIs in a loose vocabulary like SKOS. This does *not* mean that such a vocabulary is universally *better* than one that is very precise, such as a detailed biomedical ontology. It just means that it has different uses. Of course, the holy grail is to produce ontologies that are both precise and have wide application, but this is exceedingly difficult to achieve. In the meantime we must muddle along in our imperfect world, and the architecture must be designed with this in mind. > > 2. While the draft says that it doesn't define the term "information > resource" it nevertheless uses that term in many places, as if it > means something. Right. That is an artifact of AWWW and the httpRange-14 resolution that I left in there, but as Mike Bergman suggests http://lists.w3.org/Archives/Public/public-lod/2012Mar/0325.html it could eliminated entirely, as it is not needed. > For example, in 3.2.1 it says that you can tell (if a result is eg a > 200 OK) that the target URI identifies an information resource. Given > that 'information resource' isn't defined in the document, what does > that actually mean in terms of what an application should do? Nothing. The application may use it or ignore it as it sees fit. You can think of it like a "marker interface" http://en.wikipedia.org/wiki/Marker_interface_pattern with no initial semantics. At first glance this may seem pointless, but it does actually have some utility, because it means that applications that *choose* to do so can conveniently hang additional semantics onto that class. For example, an application that *chooses* to treat the class of "information resources" as disjoint with the class of Persons can easily do so. This is a choice rather than a requirement because, as has been pointed out many times, there is no clear distinction between the class of "information resources" and "non-information resources". > > 3. I like the section about resolving incompatibilities, but for me it > isn't strong enough, particularly as it's non-normative. I'd like > publishers to be able to rely on clients ignoring an implicit URI > definition when there's an explicit URI definition, for example. But publishers can *never* control what a client does in the privacy of its own RAM. Nor should they be able to, as that would be unreasonably totalitarian. On the other hand, the specification should encourage statement *authors* to use each URI in a manner that is consistent with the URI owner's URI definition, and that's what the UDDP proposal does in a section 4.1 Good Practice note: http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol#4.1_Transactional_inconsistency [[ GOOD PRACTICE (Non-normative): Before using a target URI in a statement, a statement author should obtain fresh versions of the transitive closure of a target URI's URI definition and the definitions of any URIs used in that URI definition, and should only use the target URI in a manner that is consistent with those URI definitions. ]] BTW, there's an important reason why that is only a "should" and not a "must", and this is explained under "Community expropriation of a URI" in "The URI Lifecycle in Semantic Web Architecture": http://dbooth.org/2009/lifecycle/#expropriation That document also provides more explanation of the roles and responsibilities of the statement author and consumer, if you're interested. > Without that, I think the draft is just a reworded version of > Jonathan's draft: publishers who 200 OK on URIs that are supposed to > identify People are still Wrong. But it does explicitly say that it is okay to do so. And this may either be because the URI owner believes that People can have representations or because the URI owner found it too burdensome to make the distinction: [[ GOOD PRACTICE (Non-normative): If a URI owner must choose between publishing URI definitions and following the Good Practice notes of this specification, it is normally better to publish. "The perfect is the enemy of the good." ]] An opt-out mechanism may be okay to add, but the question really is: What would be most beneficial to the community? Any opt-out mechanism shifts the burden from publishers to clients. And in the long run, do we expect to have more clients or more publishers? -- David Booth, Ph.D. http://dbooth.org/ Opinions expressed herein are those of the author and do not necessarily reflect those of his employer.
Received on Thursday, 29 March 2012 17:12:30 UTC