Re: Middle ground change proposal for httpRange-14 from David Booth on 2012-03-29 (public-lod@w3.org from March 2012)

From: David Booth <david@dbooth.org>
Date: Thu, 29 Mar 2012 13:11:57 -0400
To: Jeni Tennison <jeni@jenitennison.com>
Cc: public-lod community <public-lod@w3.org>
Message-ID: <1333041117.2181.77846.camel@dbooth-laptop>
Hi Jeni,

On Wed, 2012-03-28 at 18:01 +0100, Jeni Tennison wrote:

> > http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol
[ . . . ]
> 1. The focus on the *definition* of a URI as opposed to a mere
> description is problematic for me. There are lots of things in the
> world that couldn't be adequately *defined* but can be described to
> more or less detail. I worry that people will get tied up in knots
> trying to work out what a definition looks like for a Person or a
> Book. Although I prefer most of the language in your draft, I prefer
> the looser 'description' used in Jonathan's document.

That sounds like an important concern, but I think it is best to
separate the issue of how we educate the public about how this works,
from figuring out the engineering of how it works.  We first need to
deal with the engineering.

If you notice the definition of "URI documentation" in the "baseline"
document
http://www.w3.org/2001/tag/doc/uddp-20120229/ 
it says: "URI documentation is information that documents the intended
meaning of a particular probe URI."  That's what a definition is.  So
the terminology in the UDDP proposal
http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol#2.4_URI_definition.2C_explicit_URI_definition_and_implicit_URI_definition
is merely calling a spade a "spade".

Furthermore, there is an important distinction between a definition and
any other documentation or description.  A definition *is* documentation
(or a description) but not every piece of documentation (or not every
description) is a definition.  This key difference tends to get blurred
when a definition is blandly called "documentation" or "a description".
I suspect that some have been wary of recognizing this distinction out
of a concern that if something is called a definition, then a client
will be obligated to use that definition, and that would unreasonably
constrain the client.  But this concern is unfounded if the
specification makes clear that a client is free to do whatever it wishes
with a URI definition that it retrieves.

Finally, to give a little more insight about what it means to provide a
URI definition for something such as a Person or a Book, in some sense
the URI definition does not actually *define* that person or book.
Rather, it defines the *binding* of the URI (as a name) to a particular
description of that thing, which indirectly (partially) identifies it.
And as Pat Hayes (and others) have pointed out many times, there is
inherent ambiguity in virtually any description.  This means that a URI
definition does not *fully* determine the thing that the URI is supposed
to identify.  That is both a plus and a minus.  It is a minus because it
means that in general others can never know *exactly* what that URI
owner intended it to identify, and this leads to downstream
inconsistencies, as illustrated Part 2 of "Resource Identity and
Semantic Extensions: Making Sense of Ambiguity":
http://dbooth.org/2010/ambiguity/paper.html#inconsistent-merge

On the other hand ambiguity is also a plus because it means that the URI
can be used in a much wider variety of contexts, such as the URIs in a
loose vocabulary like SKOS.  This does *not* mean that such a vocabulary
is universally *better* than one that is very precise, such as a
detailed biomedical ontology.  It just means that it has different uses.
Of course, the holy grail is to produce ontologies that are both precise
and have wide application, but this is exceedingly difficult to achieve.
In the meantime we must muddle along in our imperfect world, and the
architecture must be designed with this in mind.
> 
> 2. While the draft says that it doesn't define the term "information
> resource" it nevertheless uses that term in many places, as if it
> means something. 

Right.  That is an artifact of AWWW and the httpRange-14 resolution that
I left in there, but as Mike Bergman suggests
http://lists.w3.org/Archives/Public/public-lod/2012Mar/0325.html
it could eliminated entirely, as it is not needed.

> For example, in 3.2.1 it says that you can tell (if a result is eg a
> 200 OK) that the target URI identifies an information resource. Given
> that 'information resource' isn't defined in the document, what does
> that actually mean in terms of what an application should do?

Nothing.  The application may use it or ignore it as it sees fit.  You
can think of it like a "marker interface"
http://en.wikipedia.org/wiki/Marker_interface_pattern
with no initial semantics.  At first glance this may seem pointless, but
it does actually have some utility, because it means that applications
that *choose* to do so can conveniently hang additional semantics onto
that class.

For example, an application that *chooses* to treat the class of
"information resources" as disjoint with the class of Persons can easily
do so.  This is a choice rather than a requirement because, as has been
pointed out many times, there is no clear distinction between the class
of "information resources" and "non-information resources".
> 
> 3. I like the section about resolving incompatibilities, but for me it
> isn't strong enough, particularly as it's non-normative. I'd like
> publishers to be able to rely on clients ignoring an implicit URI
> definition when there's an explicit URI definition, for example.

But publishers can *never* control what a client does in the privacy of
its own RAM.  Nor should they be able to, as that would be unreasonably
totalitarian.

On the other hand, the specification should encourage statement
*authors* to use each URI in a manner that is consistent with the URI
owner's URI definition, and that's what the UDDP proposal does in a
section 4.1 Good Practice note:
http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol#4.1_Transactional_inconsistency
[[
GOOD PRACTICE (Non-normative): Before using a target URI in a statement,
a statement author should obtain fresh versions of the transitive
closure of a target URI's URI definition and the definitions of any URIs
used in that URI definition, and should only use the target URI in a
manner that is consistent with those URI definitions.
]]

BTW, there's an important reason why that is only a "should" and not a
"must", and this is explained under "Community expropriation of a URI"
in "The URI Lifecycle in Semantic Web Architecture":
http://dbooth.org/2009/lifecycle/#expropriation 
That document also provides more explanation of the roles and
responsibilities of the statement author and consumer, if you're
interested.

>  Without that, I think the draft is just a reworded version of
> Jonathan's draft: publishers who 200 OK on URIs that are supposed to
> identify People are still Wrong.

But it does explicitly say that it is okay to do so.  And this may
either be because the URI owner believes that People can have
representations or because the URI owner found it too burdensome to make
the distinction:
[[
GOOD PRACTICE (Non-normative): If a URI owner must choose between
publishing URI definitions and following the Good Practice notes of this
specification, it is normally better to publish. "The perfect is the
enemy of the good."
]]

An opt-out mechanism may be okay to add, but the question really is:
What would be most beneficial to the community?  Any opt-out mechanism
shifts the burden from publishers to clients.  And in the long run, do
we expect to have more clients or more publishers?


-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Thursday, 29 March 2012 17:12:30 UTC