Re: Comments on "Providing and Discovering Definitions of URIs" from David Booth on 2012-02-16 (www-tag@w3.org from February 2012)

From: David Booth <david@dbooth.org>
Date: Wed, 15 Feb 2012 22:23:40 -0500
To: Jonathan A Rees <rees@mumble.net>
Cc: www-tag <www-tag@w3.org>
Message-ID: <1329362620.2250.163345.camel@dbooth-laptop>
On Tue, 2012-02-14 at 13:58 -0500, Jonathan A Rees wrote:
> On Tue, Feb 14, 2012 at 11:45 AM, David Booth <David@dbooth.org> wrote:
> > On Wed, 2012-02-01 at 12:10 -0500, Jonathan Rees wrote:
> >> Latest ongoing: http://www.w3.org/2001/tag/awwsw/issue57/latest/
> >
> > This is definitely getting clearer and better.  Thanks for your
> > efforts on this.  Suggestions:
> 
> We're reaching a point of diminishing returns here. The only purpose
> of this document is to spur and help guide change proposals relative
> to the baseline document. It will be forgotten about once consensus is
> reached. So I might not do another round of editing.

Well, that's disappointing.  Based on your recent responses to Leigh
Dodds and Dave Reynolds, it sounded like you welcomed comments and were
actively working on the document.

I also note that the change proposal call that you pre-announced today 
http://www.w3.org/2001/tag/doc/uddp/change-proposal-call.html
points to the "Providing and Discovering" document as providing criteria
on which the change proposals will be judged:
[[
Change proposals should address relevant questions and issues presented
in "Providing and discovering definitions of URIs" which is a review of
the design space. Proposals may be based on material in this document if
desired. Proposals that in the editor's or TAG's judgment do not address
points in this document when appropriate will be returned for revision.
]]

Hence, although the stylistic aspects of the document may be excused,
the "Desiderata" need to be right.  And at the moment, they are not.
There are two key points that really need to be fixed. (More on them
below.)

I would be happy to help with the editing if you do not have time to do
it yourself.  But one way or another, at least the Desiderata should be
fixed, even if nothing else in the document is re-written.

> 
> > 1. Though not stated as such, in essence this document tries to
> > describe competing *protocols* for establishing, indicating and
> > determining the definition of a URI's referent resource.
> 
> Not all pairs of methods compete with one another. In fact the only
> competition I know of, other than for attention, has to do with how
> retrievals are interpreted. I really do mean to give it as a
> collection of techniques that can be modified, combined, and
> arbitrated according to some yet unspecified larger design, that I
> hope someone will come up with.
> 
> And it's not *the* definition, it's just *some* documentation. Use of
> any URI documentation coordination convention is, like use of any
> standard or recommendation, voluntary. Circumstances can always
> override any convention that might otherwise be chosen.

I think that is misleading.  The whole point is to get the community to
agree on a *particular* protocol (or "coordination convention") for
providing and obtaining a *particular* URI definition -- not to tell the
community to use whatever conventions they feel like using for obtaining
whatever random URI definition they find.  Or, as the TAG Product
document states (emphasis mine):
http://www.w3.org/2001/tag/products/defininguris.html 
"The goal is to develop *a* URI documentation provision and discovery
architecture that is simple, general, performant, consistent with Web
architecture, and accepted by the linked data community".  

(FWIW, I think that quote should have called it a "discovery protocol"
or "discovery convention" instead of "discovery architecture", since we
already have an architecture by which URI definitions can be discovered.
What we need is to agree on one particular protocol/convention.)

> 
> > The
> > process is a protocol because it involves multiple interacting
> > parties who must perform their roles appropriately (according to
> > the protocol) in order to achieve the overall intended effect.
> > A protocol definition needs to clearly specify who does what,
> > in terms of the roles that are relevant to that protocol,
> > so it is very helpful to give consistent role names to the
> > most important parties in that protocol, (E.g., the HTTP
> > protocol defines roles like "client", "server" and "proxy".)
> > However, at present the document tends to use the term "agent"
> > for all roles.
> 
> If you could point out places where confusion is likely to happen that
> would help me figure out whether anything needs to be changed. I felt
> that overengineering the terminology was off-putting.

I beg to differ.  Using a different term for each of the major roles is
standard practice in descriptions of protocols or other conventions that
involve multiple parties, because it far more helpful than
off-putting.  

> ...
> > Use of specific terms like this would add more clarity to
> > the protocols that are described.
> >
> > 2. The term "URI documentation" is used in the title and
> > throughout, but this is unhelpfully vague and does not
> > adequately convey the authoritative nature of the URI's
> > definition.
> 
> I got strong pushback on "definition" and strong affirmation on
> "documentation" so I think I will stick with it. 

Really?  Where?  Can you please provide a pointer?

> And I do not agree
> with you that any of this stuff is "authoritative". The point is
> coordination for those who want to coordinate, not "authority." 

Uh . . . I meant the word "authoritative" in the protocol sense -- the
same sense as it is used in the HTTP 1.1 specification -- not in a legal
sense.  We're talking systems engineering here, not law.  A URI
definition can be "authoritative" according to a particular convention
or protocol, just as an entity-header can be "authoritative" according
to the HTTP 1.1 protocol.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.4

> This
> whole "URI owner" business is fraught with technical and philosophical
> peril and it is best to stay as far from it as possible.

That just happens to be the term chosen by AWWW:
http://www.w3.org/TR/webarch/#uri-ownership 
The point is that it is helpful to give names to the important roles in
the protocol.  Calling a role "URI owner" does not suddenly give that
role technical and philosophical peril any more than my naming my dog
"Queen Elizabeth" suddenly makes her the queen of England.  It is just a
name.

> 
> > 3. The term "probe URI" is defined and used throughout.
> > It would be helpful if a corresponding term such as "definition
> > URI" were defined and used throughout, to refer to the URI
> > location where the URI's definition is published (if it is
> > published).
> 
> Will consider. The baseline document says "documentation URI" I think.
> 
> > 4. The success criteria or "Desiderata" needs to be framed
> > in the context of the proposed URI definition protocol as a
> > whole -- not some other unstated context.  In other words,
> > the "Desiderata" should address the question: what parties
> > (in the protocol) should obtain what benefits as a result of
> > using this protocol?
> >
> > For example, this desideratum vaguely talks about the need
> > for a URI to "make sense" independent of its "community of use":
> > [[
> > Uniform
> >    The URI, considered as a reference to something, should
> >    make sense on its own, independent of context or community
> >    of use. Its meaning or "identification" should be uniform
> >    regardless whether it's used as a protocol element,
> >    hyperlink, or name. This property cannot necessarily be
> >    enforced through technical design, but a discovery solution
> >    should not depend on non-uniform meaning.
> > ]]
> > But what does "make sense" mean?  As stated, it cannot be
> > evaluated as an objective engineering criterion.
> 
> That is because it cannot be.

Okay, so maybe nothing can be stated in a way that is a *completely*
objective engineering criterion.  But we can do a lot better than using
hopelessly ambiguous terms like "make sense" and "meaning" that are open
to nearly any interpretation.  

The point is that this tar pit about "meaning" can be completely avoided
simply by focusing on the intended URI definition, rather than trying to
say anything about what that definition means.  

Please see again the suggested rewording that I provided below, which
talks *only* about obtaining URI definitions -- not meaning.  

> 
> I think you and I will just have to agree to disagree on this, as
> we've been over this question unproductively in the past.

No, we have not.  Please do not claim that we have.  This is a specific
suggestion for improving the text of this document.  Please do not try
to read more into it than that.

> 
> >  For one
> > thing, it does not say what party/parties in the URI definition
> > protocol are supposed to obtain this benefit.  For another
> > thing, it talks about "meaning" -- which is not defined --
> > instead of talking only about the URI definition.
> 
> The 'baseline' document makes this clearer I hope.

Perhaps this document should reference the relevant portion of the
'baseline' document, then?

> 
> > I suggest restating this desideratum as:
> > [[
> > Uniform
> >    Two consumers following a URI definition protocol should
> >    obtain the same or sufficiently similar resource definitions
> >    for any URI.
> > ]]
> > If needed, "sufficiently similar" can be further defined in
> > terms of the expectations of the consumer and the task to
> > be perfomed.
> >
> > 5. This desideratum needs to be substantially clarified:
> > [[
> > Compatible with inference
> >    URIs should participate gracefully in deployed frameworks
> >    for ontologies and logical inference, specifically RDF
> >    and OWL.
> > ]]
> > What does "participate gracefully" mean?  RDF semantics
> > doesn't care at all about the URIs that are used.  They are
> > just opaque strings as far as the semantics goes.
> 
> We disagree here. The formal semantics is not the semantics. This
> stuff actually gets used to get real work done.

Please clarify.  Do you mean that URIs are *not* opaque strings as far
as the semantics goes?  And if you're not referring to the formal
semantics, then what exactly do you mean by "the semantics"?  

Some counter examples might help.  How might a URI *not* be "compatible
with inference"?  Can you give some examples of URIs that are
"compatible with inference" and some others that are not?

I wish I could suggest some other wording, but as yet I have no idea
what you mean in this item.

> 
> > 6. Regarding this statement:
> > [[
> > As any overall discovery solution will combine of a number of
> > methods, avoiding conflict between adopted methods is also a
> > goal for any solution.
> > ]]
> > This is confusing.  It seems to me that the overall objective
> > of documenting these competing URI definition protocols is so
> > that they can be analyzed and discussed, and the community
> > (presumably via W3C process) can eventually sanction *one*
> > of them (which could well have conditional branches and/or
> > delegate portions of the protocol to others).
> 
> No, they can pick a portfolio if they want. People already have both
> 303 and Link:, for example.

It sounds like you misunderstood what I meant.  I was talking about the
single, overarching protocol that would be defined by the TAG's Product
on URI definition discovery -- not the various options under that
protocol, such as 303, Link:, etc.  The objective is to sanction *one*
overall protocol or convention for URI provision and discovery.  

> 
> > In other words, it would be better to present a series of
> > complete competing URI definition protocols, rather than
> > listing a bucket of parts that might be used to construct a
> > complete protocol.  Sometimes the document seems like it is
> > attempting to describe a complete protocol, and other times
> > it lapses into protocol fragments.
> 
> Yes, this is a bug I tried to fix mid-way. I think the baseline
> document is much clearer on this distinction.

I haven't reviewed that document yet.

> 
> > For example, I note that sections "3.1 Colocate URI
> > documentation and use" and "3.2 Specifically point (link)
> > to the URI documentation" essentially define complete URI
> > definition protocols.  But section "3.3 Use non-http: URIs and
> > a non-HTTP protocol" -- out of the blue -- starts talking about
> > non-http URIs, without saying how they are intended to be used
> > in a complete URI definition protocol.  What are the URI owner,
> > statement author and consumer intended to do and expect in a
> > URI definition protocol involving non-http URIs, and what impact
> > does this have with respect to the given desiderata?  It would
> > be helpful if the competing URI definition protocols were
> > presented in a consistent way, as it would facilitate analysis.
> 
> The point is to make the information available, not make up too many
> new things. I don't want to get into detailed design in this document.
> That's for the change proposals to worry about.

I was not suggesting making up any more things.  I was merely suggesting
that the document would be clearer if it spelled out how you intended
those piece parts to appear in an overall URI definition discovery
protocol.  The current state is that there are multiple URI definition
discovery protocols in use, and they are defined more by folk lore than
clear specifications.  To decide on one (over others) it is helpful to
clearly document the alternatives side by side.

> 
> > 7. Similarly, section 3.4 'Hash URI' discusses one URI syntactic
> > convention that can be used in conjuction with a complete
> > URI definition protocol.  It would be helpful if the document
> > were to say explicitly what the parties in a URI definition
> > protocol that uses this syntactic convention are expected to do.
> > For example:
> > [[
> > The URI owner should mint a probe URI containing a fragment
> > identifier, and should publish the probe URI's definition in
> > a document whose URI (the "definition URI") is the stem of
> > the probe URI, i.e., the part without the fragment identifier.
> >
> > If a statement author wishes to use the probe URI in a
> > statement, and the probe URI contains a fragment identifier,
> > the statement author should strip the fragment identifier from
> > the probe URI to produce the definition URI, and dereference the
> > definition URI to obtain the URI's definition.  The statement
> > author should only use the probe URI in a statement in a manner
> > that is consistent with the URI's definition.
> > ]]
> >
> > After stating explicitly what the URI definition protocol
> > expects each party to do, the existing observations about the
> > pros/cons of the protocol will make more sense.
> 
> I'm not sure this level of pedantry is needed in this document. I'll
> take advice from others though.
> 
> > 8. I suggest dropping use case 2.2 "Using a document as URI
> > documentation by reference to its primary topic", as I don't
> > think it is important enough.  We have enough work just focusing
> > on the most important use cases.
> 
> It's important for contrast. People get confused about this, and
> whether and how to treat this case affects the details of any overall
> design. For example, it seems to be the only use case supported by
> tdb: .
> 
> > 9. However, I suggest adding the CC license use case that you
> > described elsewhere, as that provides an excellent example of
> > what can go wrong in real, practical terms if the statement
> > author and consumer unknowingly follow different URI definition
> > protocols.  This is much more important than the existing use
> > case 2.2.

Please do add this use case, as it is the best one that I have seen that
demonstrates the problem caused when different parties assume different
protocols.  (In particular, the problem is caused when the URI owner
assumes a discovery protocol in which the httpRange-14 rule is ignored,
but the consumer assumes a discovery protocol in which the httpRange-14
rule is used.)

> >
> > 10. An additional criticism to add to section 3.1 "3.1 Colocate
> > URI documentation and use": "Furthermore, this method does
> > not scale well, as it requires each document to contain the
> > transitive closure of all URI definitions that it uses."
> 
> That would depend on the details, such as how it combines with other
> methods, which I don't want to get into.
> 
> > 11. Convention 1 states:
> > http://www.w3.org/2001/tag/awwsw/issue57/latest/#convention1
> > [[
> >  A retrieval-enabled hashless URI refers to the resource on
> >  the Web at that URI (see [generic]), independent of anything
> >  that the retrieval results (representations) say about what
> >  the URI means.
> > ]]
> > This is essentially a restatement of httpRange-14 rule (a):
> > http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039
> > [[
> >   a) If an "http" resource responds to a GET request with a
> >      2xx response, then the resource identified by that URI
> >      is an information resource;
> > ]]
> > and yet there is no mention of httpRange-14 in this section.
> > It would be good to explicitly acknowledge the httpRange-14
> > rule.
> 
> It's actually quite a bit stronger than the httpRange-14 rule, which
> says nothing about generic resources, but you're right that a
> reference would be useful here.
> 
> > 12. Section 3.5 "Retrieval as
> > equivalent to instance relationship"
> > http://www.w3.org/2001/tag/awwsw/issue57/latest/#convention1
> > Since the goal of this paper is to describe the various
> > competing protocols for establishing and determining
> > URI definitions, I think this section should be clearer
> > about exactly what URI definition (or "documentation") is
> > implied by a successful retrieval response.  For example,
> > the draft n3 rule for an HTTP 200 response on lines 150-159 at
> > http://www.w3.org/wiki/AwwswDboothsRules#rules.n3:_Classes.2C_Properties_and_Rules
> > is very specific (though minimal), only indicating that the
> > URI identifies an information resource.  (Perhaps it should
> > have also indicated that the response is a representation of
> > the resource identified by the URI, but it didn't.)
> 
> This is covered in the baseline and could go in a change proposal; too
> prescriptive here I think.
> 
> > Section 3.5 also says: "In effect, a response to a retrieval
> > request is equivalent, according to Convention 1, to URI
> > documentation that says that the response is an instance of
> > the thing named by the URI."  That doesn't seem correct.
> > The response is not an "instance" of the thing named by
> > the URI, it is a *representation* of the thing named by
> > the URI.  Furthermore, that fact comes from RFC2616 -- not
> > the httpRange-14 rule or Convention 1.
> 
> You are right according to HR14(a) and this is fixed in the baseline.
> But I would argue strenuously against any thought that leaving out the
> generic resource idea would lead to anything useful. Personally I
> would argue for HR14(a) being withdrawn over leaving it alone,
> although I would prefer the generic resource reading even better
> (which I think is similar to TimBL's).
> 
> > In other words, the successful retrieval response indicates
> > two things:
> >
> >  - the URI identifies an information resource; and
> >
> >  - the response is a representation of that information
> >  resource.
> 
> See the baseline and my blog post
> http://odontomachus.wordpress.com/2012/02/09/when-identification-and-representation-fight-who-wins/
> 
> > Perhaps these two facts should be what the implied URI
> > definition states.

s/facts/assertions/

> 
> They are not facts... only opinions / suggestions, as far as I can
> tell (I have asked Roy for clarification though, we'll see what he
> says is intended by 2616 and HTTPbis)
> 
> > 13. Regarding this: "Also not of concern here are the many
> > ways in which meaning can fail".  It is not clear what is
> > meant by "meaning can fail".  Do you mean the URI definition
> > is insufficient in some way to a consumer?  This should be
> > clarified.
> 
> Does anyone else reading this exchange between David and me think that
> "meaning can fail" fails to be meaningful?

That is offensive and unproductive.  My request for clarification was
entirely in good faith, as was my guess about what you may have meant.
Please treat it as such.

> 
> > 14. Please change "so that there is agreement on how each
> > URI is to be understood" to "so that there is *sufficient*
> > agreement on how each URI is to be understood", since as I've
> > pointed out on other occasions, there is no need for parties
> > to be in 100% agreement, nor is it even possible.
> 
> I'm not sure how a reader could get confused here.

The reader could be misled into thinking that the goal is complete
agreement on how each URI is to be understood.  That is the confusion
that adding the word "sufficient" would help to avoid.

> 
> > 15. Please change "it is OK for the URI to have distinct senses"
> > to "it is OK for the URI to have distinct definitions", since
> > this document is about communicating the *definition* of a URI,
> > not the "sense" of a URI.
> 
> The result of different definitions might or might not be different
> senses, and it is the senses that are important - the definition is
> only a vehicle.

Again, saying anything at all about "senses" or "meaning" or "making
sense" merely plunges this effort into a hopeless tar pit, and it is
completely unnecessary.  The goal is to agree on a protocol for
providing and discovering URI *definitions*.  It is *not* to agree on
what those definitions may mean.

David


> 
> > 16. The reference to "Convention 1" in the desiderata needs
> > to indicate the section number and/or a link, since at present
> > there is no section called "Convention 1".
> 
> ok.
> Jonathan
> 
> > --
> > David Booth, Ph.D.
> > http://dbooth.org/
> >
> > Opinions expressed herein are those of the author and do not necessarily
> > reflect those of his employer.
> >
> 
> 



-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Thursday, 16 February 2012 03:24:08 UTC