Re: Comments on "Providing and Discovering Definitions of URIs" from Jonathan A Rees on 2012-02-14 (www-tag@w3.org from February 2012)

From: Jonathan A Rees <rees@mumble.net>
Date: Tue, 14 Feb 2012 13:58:41 -0500
To: David Booth <david@dbooth.org>
Cc: Leigh Dodds <leigh.dodds@talis.com>, www-tag@w3.org
Message-ID: <CAGnGFMKwg8ViRWPJq-_haZnt7Sfeh5vJiyVnxS2Z1XAqnRf_HA@mail.gmail.com>
On Tue, Feb 14, 2012 at 11:45 AM, David Booth <david@dbooth.org> wrote:
> On Wed, 2012-02-01 at 12:10 -0500, Jonathan Rees wrote:
>> Latest ongoing: http://www.w3.org/2001/tag/awwsw/issue57/latest/
>
> This is definitely getting clearer and better.  Thanks for your
> efforts on this.  Suggestions:

We're reaching a point of diminishing returns here. The only purpose
of this document is to spur and help guide change proposals relative
to the baseline document. It will be forgotten about once consensus is
reached. So I might not do another round of editing.

> 1. Though not stated as such, in essence this document tries to
> describe competing *protocols* for establishing, indicating and
> determining the definition of a URI's referent resource.

Not all pairs of methods compete with one another. In fact the only
competition I know of, other than for attention, has to do with how
retrievals are interpreted. I really do mean to give it as a
collection of techniques that can be modified, combined, and
arbitrated according to some yet unspecified larger design, that I
hope someone will come up with.

And it's not *the* definition, it's just *some* documentation. Use of
any URI documentation coordination convention is, like use of any
standard or recommendation, voluntary. Circumstances can always
override any convention that might otherwise be chosen.

> The
> process is a protocol because it involves multiple interacting
> parties who must perform their roles appropriately (according to
> the protocol) in order to achieve the overall intended effect.
> A protocol definition needs to clearly specify who does what,
> in terms of the roles that are relevant to that protocol,
> so it is very helpful to give consistent role names to the
> most important parties in that protocol, (E.g., the HTTP
> protocol defines roles like "client", "server" and "proxy".)
> However, at present the document tends to use the term "agent"
> for all roles.

If you could point out places where confusion is likely to happen that
would help me figure out whether anything needs to be changed. I felt
that overengineering the terminology was off-putting.
...
> Use of specific terms like this would add more clarity to
> the protocols that are described.
>
> 2. The term "URI documentation" is used in the title and
> throughout, but this is unhelpfully vague and does not
> adequately convey the authoritative nature of the URI's
> definition.

I got strong pushback on "definition" and strong affirmation on
"documentation" so I think I will stick with it. And I do not agree
with you that any of this stuff is "authoritative". The point is
coordination for those who want to coordinate, not "authority." This
whole "URI owner" business is fraught with technical and philosophical
peril and it is best to stay as far from it as possible.

> 3. The term "probe URI" is defined and used throughout.
> It would be helpful if a corresponding term such as "definition
> URI" were defined and used throughout, to refer to the URI
> location where the URI's definition is published (if it is
> published).

Will consider. The baseline document says "documentation URI" I think.

> 4. The success criteria or "Desiderata" needs to be framed
> in the context of the proposed URI definition protocol as a
> whole -- not some other unstated context.  In other words,
> the "Desiderata" should address the question: what parties
> (in the protocol) should obtain what benefits as a result of
> using this protocol?
>
> For example, this desideratum vaguely talks about the need
> for a URI to "make sense" independent of its "community of use":
> [[
> Uniform
>    The URI, considered as a reference to something, should
>    make sense on its own, independent of context or community
>    of use. Its meaning or "identification" should be uniform
>    regardless whether it's used as a protocol element,
>    hyperlink, or name. This property cannot necessarily be
>    enforced through technical design, but a discovery solution
>    should not depend on non-uniform meaning.
> ]]
> But what does "make sense" mean?  As stated, it cannot be
> evaluated as an objective engineering criterion.

That is because it cannot be.

I think you and I will just have to agree to disagree on this, as
we've been over this question unproductively in the past.

>  For one
> thing, it does not say what party/parties in the URI definition
> protocol are supposed to obtain this benefit.  For another
> thing, it talks about "meaning" -- which is not defined --
> instead of talking only about the URI definition.

The 'baseline' document makes this clearer I hope.

> I suggest restating this desideratum as:
> [[
> Uniform
>    Two consumers following a URI definition protocol should
>    obtain the same or sufficiently similar resource definitions
>    for any URI.
> ]]
> If needed, "sufficiently similar" can be further defined in
> terms of the expectations of the consumer and the task to
> be perfomed.
>
> 5. This desideratum needs to be substantially clarified:
> [[
> Compatible with inference
>    URIs should participate gracefully in deployed frameworks
>    for ontologies and logical inference, specifically RDF
>    and OWL.
> ]]
> What does "participate gracefully" mean?  RDF semantics
> doesn't care at all about the URIs that are used.  They are
> just opaque strings as far as the semantics goes.

We disagree here. The formal semantics is not the semantics. This
stuff actually gets used to get real work done.

> 6. Regarding this statement:
> [[
> As any overall discovery solution will combine of a number of
> methods, avoiding conflict between adopted methods is also a
> goal for any solution.
> ]]
> This is confusing.  It seems to me that the overall objective
> of documenting these competing URI definition protocols is so
> that they can be analyzed and discussed, and the community
> (presumably via W3C process) can eventually sanction *one*
> of them (which could well have conditional branches and/or
> delegate portions of the protocol to others).

No, they can pick a portfolio if they want. People already have both
303 and Link:, for example.

> In other words, it would be better to present a series of
> complete competing URI definition protocols, rather than
> listing a bucket of parts that might be used to construct a
> complete protocol.  Sometimes the document seems like it is
> attempting to describe a complete protocol, and other times
> it lapses into protocol fragments.

Yes, this is a bug I tried to fix mid-way. I think the baseline
document is much clearer on this distinction.

> For example, I note that sections "3.1 Colocate URI
> documentation and use" and "3.2 Specifically point (link)
> to the URI documentation" essentially define complete URI
> definition protocols.  But section "3.3 Use non-http: URIs and
> a non-HTTP protocol" -- out of the blue -- starts talking about
> non-http URIs, without saying how they are intended to be used
> in a complete URI definition protocol.  What are the URI owner,
> statement author and consumer intended to do and expect in a
> URI definition protocol involving non-http URIs, and what impact
> does this have with respect to the given desiderata?  It would
> be helpful if the competing URI definition protocols were
> presented in a consistent way, as it would facilitate analysis.

The point is to make the information available, not make up too many
new things. I don't want to get into detailed design in this document.
That's for the change proposals to worry about.

> 7. Similarly, section 3.4 'Hash URI' discusses one URI syntactic
> convention that can be used in conjuction with a complete
> URI definition protocol.  It would be helpful if the document
> were to say explicitly what the parties in a URI definition
> protocol that uses this syntactic convention are expected to do.
> For example:
> [[
> The URI owner should mint a probe URI containing a fragment
> identifier, and should publish the probe URI's definition in
> a document whose URI (the "definition URI") is the stem of
> the probe URI, i.e., the part without the fragment identifier.
>
> If a statement author wishes to use the probe URI in a
> statement, and the probe URI contains a fragment identifier,
> the statement author should strip the fragment identifier from
> the probe URI to produce the definition URI, and dereference the
> definition URI to obtain the URI's definition.  The statement
> author should only use the probe URI in a statement in a manner
> that is consistent with the URI's definition.
> ]]
>
> After stating explicitly what the URI definition protocol
> expects each party to do, the existing observations about the
> pros/cons of the protocol will make more sense.

I'm not sure this level of pedantry is needed in this document. I'll
take advice from others though.

> 8. I suggest dropping use case 2.2 "Using a document as URI
> documentation by reference to its primary topic", as I don't
> think it is important enough.  We have enough work just focusing
> on the most important use cases.

It's important for contrast. People get confused about this, and
whether and how to treat this case affects the details of any overall
design. For example, it seems to be the only use case supported by
tdb: .

> 9. However, I suggest adding the CC license use case that you
> described elsewhere, as that provides an excellent example of
> what can go wrong in real, practical terms if the statement
> author and consumer unknowingly follow different URI definition
> protocols.  This is much more important than the existing use
> case 2.2.
>
> 10. An additional criticism to add to section 3.1 "3.1 Colocate
> URI documentation and use": "Furthermore, this method does
> not scale well, as it requires each document to contain the
> transitive closure of all URI definitions that it uses."

That would depend on the details, such as how it combines with other
methods, which I don't want to get into.

> 11. Convention 1 states:
> http://www.w3.org/2001/tag/awwsw/issue57/latest/#convention1
> [[
>  A retrieval-enabled hashless URI refers to the resource on
>  the Web at that URI (see [generic]), independent of anything
>  that the retrieval results (representations) say about what
>  the URI means.
> ]]
> This is essentially a restatement of httpRange-14 rule (a):
> http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039
> [[
>   a) If an "http" resource responds to a GET request with a
>      2xx response, then the resource identified by that URI
>      is an information resource;
> ]]
> and yet there is no mention of httpRange-14 in this section.
> It would be good to explicitly acknowledge the httpRange-14
> rule.

It's actually quite a bit stronger than the httpRange-14 rule, which
says nothing about generic resources, but you're right that a
reference would be useful here.

> 12. Section 3.5 "Retrieval as
> equivalent to instance relationship"
> http://www.w3.org/2001/tag/awwsw/issue57/latest/#convention1
> Since the goal of this paper is to describe the various
> competing protocols for establishing and determining
> URI definitions, I think this section should be clearer
> about exactly what URI definition (or "documentation") is
> implied by a successful retrieval response.  For example,
> the draft n3 rule for an HTTP 200 response on lines 150-159 at
> http://www.w3.org/wiki/AwwswDboothsRules#rules.n3:_Classes.2C_Properties_and_Rules
> is very specific (though minimal), only indicating that the
> URI identifies an information resource.  (Perhaps it should
> have also indicated that the response is a representation of
> the resource identified by the URI, but it didn't.)

This is covered in the baseline and could go in a change proposal; too
prescriptive here I think.

> Section 3.5 also says: "In effect, a response to a retrieval
> request is equivalent, according to Convention 1, to URI
> documentation that says that the response is an instance of
> the thing named by the URI."  That doesn't seem correct.
> The response is not an "instance" of the thing named by
> the URI, it is a *representation* of the thing named by
> the URI.  Furthermore, that fact comes from RFC2616 -- not
> the httpRange-14 rule or Convention 1.

You are right according to HR14(a) and this is fixed in the baseline.
But I would argue strenuously against any thought that leaving out the
generic resource idea would lead to anything useful. Personally I
would argue for HR14(a) being withdrawn over leaving it alone,
although I would prefer the generic resource reading even better
(which I think is similar to TimBL's).

> In other words, the successful retrieval response indicates
> two things:
>
>  - the URI identifies an information resource; and
>
>  - the response is a representation of that information
>  resource.

See the baseline and my blog post
http://odontomachus.wordpress.com/2012/02/09/when-identification-and-representation-fight-who-wins/

> Perhaps these two facts should be what the implied URI
> definition states.

They are not facts... only opinions / suggestions, as far as I can
tell (I have asked Roy for clarification though, we'll see what he
says is intended by 2616 and HTTPbis)

> 13. Regarding this: "Also not of concern here are the many
> ways in which meaning can fail".  It is not clear what is
> meant by "meaning can fail".  Do you mean the URI definition
> is insufficient in some way to a consumer?  This should be
> clarified.

Does anyone else reading this exchange between David and me think that
"meaning can fail" fails to be meaningful?

> 14. Please change "so that there is agreement on how each
> URI is to be understood" to "so that there is *sufficient*
> agreement on how each URI is to be understood", since as I've
> pointed out on other occasions, there is no need for parties
> to be in 100% agreement, nor is it even possible.

I'm not sure how a reader could get confused here.

> 15. Please change "it is OK for the URI to have distinct senses"
> to "it is OK for the URI to have distinct definitions", since
> this document is about communicating the *definition* of a URI,
> not the "sense" of a URI.

The result of different definitions might or might not be different
senses, and it is the senses that are important - the definition is
only a vehicle.

> 16. The reference to "Convention 1" in the desiderata needs
> to indicate the section number and/or a link, since at present
> there is no section called "Convention 1".

ok.
Jonathan

> --
> David Booth, Ph.D.
> http://dbooth.org/
>
> Opinions expressed herein are those of the author and do not necessarily
> reflect those of his employer.
>
Received on Tuesday, 14 February 2012 18:59:10 UTC