Comments on "Providing and Discovering Definitions of URIs" from David Booth on 2012-02-14 (www-tag@w3.org from February 2012)

From: David Booth <david@dbooth.org>
Date: Tue, 14 Feb 2012 11:45:37 -0500
To: Jonathan Rees <rees@mumble.net>
Cc: Leigh Dodds <leigh.dodds@talis.com>, www-tag@w3.org
Message-ID: <1329237937.2250.142033.camel@dbooth-laptop>
On Wed, 2012-02-01 at 12:10 -0500, Jonathan Rees wrote:
> Latest ongoing: http://www.w3.org/2001/tag/awwsw/issue57/latest/

This is definitely getting clearer and better.  Thanks for your
efforts on this.  Suggestions:

1. Though not stated as such, in essence this document tries to
describe competing *protocols* for establishing, indicating and
determining the definition of a URI's referent resource.  The
process is a protocol because it involves multiple interacting
parties who must perform their roles appropriately (according to
the protocol) in order to achieve the overall intended effect.
A protocol definition needs to clearly specify who does what,
in terms of the roles that are relevant to that protocol,
so it is very helpful to give consistent role names to the
most important parties in that protocol, (E.g., the HTTP
protocol defines roles like "client", "server" and "proxy".)
However, at present the document tends to use the term "agent"
for all roles.

As described in "The URI Lifecycle in Semantic Web Architecture"
http://dbooth.org/2009/lifecycle/#roles 
I believe the three most important roles in a protocol for
establishing and determining URI definitions are:
[[
    *URI owner*.  This is the person or social entity that
    has the authority to establish an association between
    a URI and a resource, as defined in AWWW.  Normally it
    is the owner of the domain from which the URI is minted,
    however, the owner may delegate minting authority for all
    or portions of a URI space.

    *Statement author*.  This is a person or agent that decides
    to use the URI in an RDF statement to denote a resource.

    *Consumer*.  This is a person or application that reads
    an RDF statement and wishes to know what resource the URI
    was intended to denote.
]]

Use of specific terms like this would add more clarity to
the protocols that are described.

2. The term "URI documentation" is used in the title and
throughout, but this is unhelpfully vague and does not
adequately convey the authoritative nature of the URI's
definition.  This is not about providing and locating any
old "documentation" that might be lying around on the web,
it is about providing and locating the *intended* definition
of the resource identified by that URI.  I suggest changing
the document back to using the term "URI definition" as it
previously had used.

3. The term "probe URI" is defined and used throughout.
It would be helpful if a corresponding term such as "definition
URI" were defined and used throughout, to refer to the URI
location where the URI's definition is published (if it is
published).

4. The success criteria or "Desiderata" needs to be framed
in the context of the proposed URI definition protocol as a
whole -- not some other unstated context.  In other words,
the "Desiderata" should address the question: what parties
(in the protocol) should obtain what benefits as a result of
using this protocol?

For example, this desideratum vaguely talks about the need
for a URI to "make sense" independent of its "community of use":
[[
Uniform
    The URI, considered as a reference to something, should
    make sense on its own, independent of context or community
    of use. Its meaning or "identification" should be uniform
    regardless whether it's used as a protocol element,
    hyperlink, or name. This property cannot necessarily be
    enforced through technical design, but a discovery solution
    should not depend on non-uniform meaning.
]]
But what does "make sense" mean?  As stated, it cannot be
evaluated as an objective engineering criterion.  For one
thing, it does not say what party/parties in the URI definition
protocol are supposed to obtain this benefit.  For another
thing, it talks about "meaning" -- which is not defined --
instead of talking only about the URI definition.

I suggest restating this desideratum as:
[[
Uniform
    Two consumers following a URI definition protocol should
    obtain the same or sufficiently similar resource definitions
    for any URI.
]]
If needed, "sufficiently similar" can be further defined in
terms of the expectations of the consumer and the task to
be perfomed.


5. This desideratum needs to be substantially clarified:
[[
Compatible with inference
    URIs should participate gracefully in deployed frameworks
    for ontologies and logical inference, specifically RDF
    and OWL.
]]
What does "participate gracefully" mean?  RDF semantics
doesn't care at all about the URIs that are used.  They are
just opaque strings as far as the semantics goes.

6. Regarding this statement:
[[
As any overall discovery solution will combine of a number of
methods, avoiding conflict between adopted methods is also a
goal for any solution.
]]
This is confusing.  It seems to me that the overall objective
of documenting these competing URI definition protocols is so
that they can be analyzed and discussed, and the community
(presumably via W3C process) can eventually sanction *one*
of them (which could well have conditional branches and/or
delegate portions of the protocol to others).

In other words, it would be better to present a series of
complete competing URI definition protocols, rather than
listing a bucket of parts that might be used to construct a
complete protocol.  Sometimes the document seems like it is
attempting to describe a complete protocol, and other times
it lapses into protocol fragments.

For example, I note that sections "3.1 Colocate URI
documentation and use" and "3.2 Specifically point (link)
to the URI documentation" essentially define complete URI
definition protocols.  But section "3.3 Use non-http: URIs and
a non-HTTP protocol" -- out of the blue -- starts talking about
non-http URIs, without saying how they are intended to be used
in a complete URI definition protocol.  What are the URI owner,
statement author and consumer intended to do and expect in a
URI definition protocol involving non-http URIs, and what impact
does this have with respect to the given desiderata?  It would
be helpful if the competing URI definition protocols were
presented in a consistent way, as it would facilitate analysis.

7. Similarly, section 3.4 'Hash URI' discusses one URI syntactic
convention that can be used in conjuction with a complete
URI definition protocol.  It would be helpful if the document
were to say explicitly what the parties in a URI definition
protocol that uses this syntactic convention are expected to do.
For example:
[[
The URI owner should mint a probe URI containing a fragment
identifier, and should publish the probe URI's definition in
a document whose URI (the "definition URI") is the stem of
the probe URI, i.e., the part without the fragment identifier.

If a statement author wishes to use the probe URI in a
statement, and the probe URI contains a fragment identifier,
the statement author should strip the fragment identifier from
the probe URI to produce the definition URI, and dereference the
definition URI to obtain the URI's definition.  The statement
author should only use the probe URI in a statement in a manner
that is consistent with the URI's definition.  
]]

After stating explicitly what the URI definition protocol
expects each party to do, the existing observations about the
pros/cons of the protocol will make more sense.

8. I suggest dropping use case 2.2 "Using a document as URI
documentation by reference to its primary topic", as I don't
think it is important enough.  We have enough work just focusing
on the most important use cases.

9. However, I suggest adding the CC license use case that you
described elsewhere, as that provides an excellent example of
what can go wrong in real, practical terms if the statement
author and consumer unknowingly follow different URI definition
protocols.  This is much more important than the existing use
case 2.2.

10. An additional criticism to add to section 3.1 "3.1 Colocate
URI documentation and use": "Furthermore, this method does
not scale well, as it requires each document to contain the
transitive closure of all URI definitions that it uses."

11. Convention 1 states:
http://www.w3.org/2001/tag/awwsw/issue57/latest/#convention1
[[
  A retrieval-enabled hashless URI refers to the resource on
  the Web at that URI (see [generic]), independent of anything
  that the retrieval results (representations) say about what
  the URI means.
]]
This is essentially a restatement of httpRange-14 rule (a):
http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039
[[
   a) If an "http" resource responds to a GET request with a
      2xx response, then the resource identified by that URI
      is an information resource;
]]
and yet there is no mention of httpRange-14 in this section.
It would be good to explicitly acknowledge the httpRange-14
rule.

12. Section 3.5 "Retrieval as
equivalent to instance relationship"
http://www.w3.org/2001/tag/awwsw/issue57/latest/#convention1
Since the goal of this paper is to describe the various
competing protocols for establishing and determining
URI definitions, I think this section should be clearer
about exactly what URI definition (or "documentation") is
implied by a successful retrieval response.  For example,
the draft n3 rule for an HTTP 200 response on lines 150-159 at
http://www.w3.org/wiki/AwwswDboothsRules#rules.n3:_Classes.2C_Properties_and_Rules
is very specific (though minimal), only indicating that the
URI identifies an information resource.  (Perhaps it should
have also indicated that the response is a representation of
the resource identified by the URI, but it didn't.)

Section 3.5 also says: "In effect, a response to a retrieval
request is equivalent, according to Convention 1, to URI
documentation that says that the response is an instance of
the thing named by the URI."  That doesn't seem correct.
The response is not an "instance" of the thing named by
the URI, it is a *representation* of the thing named by
the URI.  Furthermore, that fact comes from RFC2616 -- not
the httpRange-14 rule or Convention 1.

In other words, the successful retrieval response indicates
two things:

 - the URI identifies an information resource; and

 - the response is a representation of that information
 resource.

Perhaps these two facts should be what the implied URI
definition states.

13. Regarding this: "Also not of concern here are the many
ways in which meaning can fail".  It is not clear what is
meant by "meaning can fail".  Do you mean the URI definition
is insufficient in some way to a consumer?  This should be
clarified.

14. Please change "so that there is agreement on how each
URI is to be understood" to "so that there is *sufficient*
agreement on how each URI is to be understood", since as I've
pointed out on other occasions, there is no need for parties
to be in 100% agreement, nor is it even possible.

15. Please change "it is OK for the URI to have distinct senses"
to "it is OK for the URI to have distinct definitions", since
this document is about communicating the *definition* of a URI,
not the "sense" of a URI.

16. The reference to "Convention 1" in the desiderata needs
to indicate the section number and/or a link, since at present
there is no section called "Convention 1".


-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Tuesday, 14 February 2012 16:46:06 UTC