Re: New draft of section 5.5 from David Booth on 2011-04-11 (public-awwsw@w3.org from April 2011)

From: David Booth <david@dbooth.org>
Date: Mon, 11 Apr 2011 12:37:51 -0400
To: Jonathan Rees <jar@creativecommons.org>
Cc: AWWSW TF <public-awwsw@w3.org>
Message-ID: <1302539871.1983.50045.camel@dbooth-laptop>
General comments on
http://www.w3.org/2001/tag/awwsw/issue57/latest/

1. The scope of the document looks considerably more focused now, such
that I think I have a much clearer understanding now of your intent for
this document.  Kudos!  If I am correctly understanding your intent,
there is no need for the document to get into the issue of a URI's
meaning, because the document is only about the conventions for
publishing and locating the URI owner's intended *definition* of the URI
-- not how that definition is interpreted or what is done with it.  Does
that sound correct?

2. The abstract looks clearer now.  But I think it is important that the
abstract set the context as being primarily RDF, since that is the
context that is primarily motivating the issue.  The question does not
arise in the same way or to the same degree on the conventional web.

3. 
[[
Two contexts of special interest to this report are in natural language
(e.g. "The W3C home page is 'http://www.w3.org/'"), and in declarative
languages such as RDF and OWL.
]]
I don't think we should try to address the natural language case.  It is
the RDF case that is motivating this, and it will be difficult enough
addressing that without adding a new can of worms by trying to address
the natural language case.

4. Malformed sentence:
[[
Definition discovery is not the same as Web dereference, however, since
dereferencing a URI gives you information - i.e. the document, image,
etc. specified by the URI - not necessarily related to defining
anything.
]]

5. I think this needs to be reworded:
[[
In theory dereference could play a role in explaining the meaning of a
dereferenceable URI (see 5.6 'Hashless' URI dereferences to its
definition (incompatibly)), but this is not generally done at present,
]]
because dereference *does* currently play a role at present.  In
particular, people use dereference to conclude that something is an IR.

6. Very good:
[[
The reason we define definition discovery methods is interoperability -
so that everyone gets the same definition of each URI.
]]
HOWEVER, the idea of everyone using the same definition is *not*
universally accepted in the SW community.  I particularly remember
Michel Dumontier using URIs that specifically lacked universal
definitions, such that the user could choose his/her desired definition.
Perhaps we should acknowledge this as a dissenting view.


7. If we're preparing this for a wider audience, maybe we should revert
to the term "representation" instead of "version" throughout, since
that's what AWWW uses.

8. In the definition of "dereferenceable", I don't understand why this
clause is included:
[[
or to perform some other action on an associated resource ([rfc3986]
section 1.2.2)
]]
I find it confusing, because it would seem to make a hash URI
dereferenceable, as the HTTP GET would be performed on the associated
resource -- the hashless part.

9. Is the notion of "fixed information resource" needed in this
document?  Similarly, are definitions of "metadata", "refer" and "term"
needed?

10. s/in which the occurs/in which the URI occurs/

11. Re "This is the approach taken by OWL": I think it would be more
accurate to say that OWL *supports* this approach.

12. I notice that sec 3.4 discusses some pros/cons of its approach, but
sections 3.1 and 3.2 do not.  In general, critiques of the approaches
are in section 4 -- separate from the descriptions of the approaches.
Section 5 purports to describe proposed new approaches, but most of them
are actually just refinements of existing approaches.  Furthermore, each
refinement needs to have pros/cons discussed, just as the basic
approaches do.  Hence, I would suggest restructuring the document as
follows, to keep the pros/cons with the approach descriptions, and to
keep the refinements with their respective base descriptions.  I have
kept the original section numbers so that you can more easily see how
they are being moved:
[[
3 Current definition methods 
    3.5 'Hash URI'
        Con: 4.1 Fragment identifiers are fragile
        Refinement: 5.2 'Hash URI' with fixed suffix
            Pros/cons
        Con: 4.2 The common 'hash URI' pattern fails with large
namespaces
        Refinement: Use multiple base URIs
            Pros/cons
        Con: 4.3 Hash URIs don't support REST architecture
    3.6 'Hashless URI' with HTTP 303 See Other redirect
        Con: 4.4 303 is difficult, sometimes impossible, to deploy
        Con: 4.5 303 leads to too many round trips
        Refinement: Use 303-redirect service
            Pros/cons
        Refinement: Define optimization pattern in .well-known
            Pros/cons
        Con: 4.6 303 makes the URI difficult to bookmark
    5.5 'Hashless' URI dereferences to its definition
        Pros/Cons
    3.3 Register a URI scheme or URN namespace
        Pros/Cons
    3.4 Use the LSID getMetadata() method
        Pros/Cons
    3.2 Link to documents containing definitions
        Pros/Cons
    3.1 Colocate definition and use
        Pros/Cons
4 Possible new definition methods
    5.3 'Hashless URI' with site-specific discovery rules
        Pros/cons
    5.4 'Hashless URI' with new HTTP request or response
        Pros/cons
]]


13. Sec 3.3: I think the scheme registration example needs to be
modified to describe *delegated* URI definitions, since it is not
realistic to think that the IANA registration itself would directly
define the meanings of all of the URIs in the scheme.   For example, the
scheme "mountain:" may be for mountains, but it would have to say how an
individual would define a new mountain URI under that scheme.

14. s/refers to a what/refers to what/

15. I think it would be helpful if the use case in sec 2.2 also included
the specific questions that need to be answered by each proposed
solution.  On the other hand, I am uncertain that this use case is
really important enough to include.   [Added later:]  On further
reflection, I think use case 2.2 should be dropped, as it does not add
enough to be worthwhile.  

16. Similarly, I think it would be helpful if each proposed solution
explicitly stated what Alice, Bob and Carol should do, according to that
solution: "According to this approach, in scenario 2.1, Alice
should . . . Bob should  . . . Carol should . . . ".  I first noticed
the need for this in sec 3.4 (LSID), perhaps because I don't know the
details of how LSID works.

17. s/since otherwise it would refer to/since otherwise under this
approach it would refer to/

18. s/the URI does not refer IR('http://example/p16')/the URI does not
refer to IR('http://example/p16')/

19. The diagrams are nice!  One suggestion: s/specifies/defines/, since
"definition" is the term that is used elsewhere in the document.

20. Sec 3.4: I'm a little surprised to see LSIDs singled out, since it
feels like there have been a zillion identifier techniques proposed,
including DOI, ARK, XRI, "info:", "tag:", "tbd:", etc.  I'm not sure how
we should address them all, as the proponents of each thought theirs to
be uniquely important in some particular way.  

21. I think it would be helpful to list the approaches in (approximate)
descending order of popularity in the document, which I guess would be:
hash URIs, 303-redirect, 'Hashless' URI dereferences to its definition,
link to documents containing definitions, LSID/other schemes, new URI
scheme, colocate definition and use.

22. It occurs to me: Doesn't sec 3.3 "Register a URI scheme or URN
namespace" belong in the proposed new approaches section, rather than in
the existing approaches section?

23. Sec 4: Each criticism should also include rebuttals or mitigating
techniques.  For example, in "sec 4.2 The common 'hash URI' pattern
fails with large namespaces", in would be good to point out the large
namespaces can be subdivided into multiple hashless base URIs, although
this may make them harder to use (because multiple @prefixes may need to
be declared).

24. Sec 4.3 "Hash URIs don't work with HTTP PUT, POST, or DELETE
methods": I am not familiar with this criticism.  Pointer please?

25. Sec 4.4 "303 is difficult, sometimes impossible, to deploy": Again,
this can be mitigated by use of a 303-redirect server, such as
http://thing-described-by.org/ , or an equivalent distributed technique
based on .well-known RFC5785:
http://tools.ietf.org/html/rfc5785

26. Ditto for sec 4.5 "303 leads to too many round trips"

27. Sec 4.7 "The normative specifications are incomplete": Which
approach is this criticizing?

28. I think sec 5.1 "Use something other than a URI" can be deleted,
since the value of using a URI is well established and quite fundamental
to web architecture.

29. The "'Hashless' URI dereferences to its definition" approach is an
*existing* approach that some use, so I think it belongs in section 3.

30. Sections 5.5 "'Hashless' URI dereferences to its definition
(compatibly)" and 5.6 start talking about how a definition is
*interpreted*, which (by my new reading of the rest of the document)
seems out of scope with the document's current intent of focusing only
on the *mechanics* of providing and obtaining a URI definition.
Shouldn't these sections be merged into one, that merely states the
mechanism and points out the pros/cons?  The potential conflict of
meaning between what the URI definition says and the information derived
from the httpRange-14 rule does represent a "con" for this approach.  

This "con" could be framed as a problem of two (competing) URI
definitions having been provided: one implicitly by the httpRange-14
rule (indicating that the resource is an IR) and the other explicitly by
the retrieved document content.  In framing the problem this way, the
question would be which definition or combination of definitions to
believe.  Note that we don't currently ask this question of other
approaches, even though different definitions could be provided by
multiple approaches.

It seems to me that we should either entirely steer clear of getting
into the "meaning" of the URI, or we will have to get in much deeper,
which is what I previously thought your intent was, and which was why I
was trying to elicit what you meant by "meaning" and insisting on
discussing only *observable* characteristics.

Other responses inline below . . .


On Sat, 2011-04-09 at 19:12 -0400, Jonathan Rees wrote:
> On Mon, Apr 4, 2011 at 9:34 PM, David Booth <david@dbooth.org> wrote:
> > Attached is an updated version.  Inline responses . . .
> >
> > On Mon, 2011-04-04 at 16:37 -0400, Jonathan Rees wrote:
> >> "its meaning should be obtained from that definition instead of from
> >> the httpRange-14 rule regarding information resources."
> >> - I invoke the "IR reference rule" in the document, and it can be hyperlinked.
> >>   (Actually the httpRange-14 rule as we know is wrong in all sorts of
> >> ways to referring to it directly is very risky.  E.g. we know the
> >> purpose isn't to say that the URI refers to *any* information
> >> resource, i.e. it has nothing to do with typing; it really means to
> >> say - and I think most people have understood it to say - that it
> >> refers to a *particular* information resource.)
> >
> > Yes, I think I agree, but I'm not sure what you are suggesting.  My note
> > "[TODO: Say somewhere what the httpRange-14 rule is]" was meant as an
> > editorial reminder that we should say more explicitly what inference
> > rule we mean, when we refer to the "httpRange-14 rule".  In that draft
> > document I have assumed that the consequence of the rule consists of the
> > two assertions in graph gh, but we really should say explicitly (e.g.,
> > in n3) what rule we're assuming.
> >
> >>
> >> "Because of the 200 status code, Bob applies the httpRange-14 rule and
> >> concludes the following:"
> >>
> >> It doesn't matter how Bob concludes that metadata, but it would be
> >> harmful to say that a single HTTP response is adequate to justify it;
> >> for the metadata to be useful it has to be true of what someone who
> >> reads Bob's metadata will get. I think it is better to be vague since
> >> this has nothing to do with this section.
> >
> > But if we don't provide any justification, then we could just as well
> > say that Bob concludes that <http://example/p16> refers to an elephant.
> 
> No we couldn't - the scenario would be different then.
> 
> > The point is that the 200 status code *justifies* Bob's statements about
> > <http://example/p16> as a web-accessible thing.
> 
> It's not a question of justification, but of convention. Most people
> have adopted the IR reference rule. That is why Bob uses it - because
> he wants to be understood.

Right, that's exactly the rationale I meant.  I was suggesting that we
be clear about *why* Bob was treating <http://example/p16> as a
web-accessible thing.  

However, I think this issue may be moot if the document is only focusing
on the mechanics of providing and obtaining an authoritative URI
definition.

> 
> I believe I've corrected the scenario in a few ways since you made
> these comments, so perhaps your objections here are moot. In this
> case, Alice and Bob and Carol will all know that some new protocol is
> in effect, and the question is just what that protocol needs to be.
> 
> >>
> >> "web:hasUri"  -- the document already defines the predicate (if it's
> >> the one I think you mean) and it's called :accessibleVia.  There is no
> >> reason to say that the subject has an information resource type and
> >> doing so weakens the document.
> >
> > Okay, I've changed web:hasUri to :assessibleVia throughout.  I also
> > removed class web:IR from the RDF, as it is not needed for the
> > inferencing, and in the prose changed it to "web-accessible thing".
> >
> >>
> >> Bob actually concludes that the URI refers to the IR at that URI. It
> >> is better to say this in English since in the example he really does
> >> conclude this. If written in RDF it will have to be translated for the
> >> benefit of readers, and that's redundant.
> >
> > I think it is important to focus on *observable* facts.  What Bob
> > privately believes in his own mind is irrelevant.  The point is that
> > graph gh is what gives Bob license to make assertions about
> > <http://example/p16> as a web-accessible thing.
> 
> I disagree. Sometimes arguments on general principle are easier to
> understand than those containing distracting and extraneous detail.
> And logical arguments are often not in terms of observables, but
> rather in terms of ... logic.

If we stick to the level of discussing only the mechanics of how URI
definitions are provided and obtained, then I think we'll be fine.  But
if we start trying to talk about the *meaning* of a URI, then I firmly
believe we must clearly state what we mean by "meaning" and it must be
stated in terms of *observable* outcomes.  Otherwise, I think we will be
wasting everyone's time.  If it looks like we need to get into this
area, and we cannot agree whether we should focus on observable outcomes
then we should ask others for input.

> 
> >>
> >> I don't see any reason to go into such detail on what Carol wants to
> >> do. Most of the detail you've provided is unnecessary and distracting.
> >> She really just needs to figure out what was meant by each use of the
> >> URI.
> >
> > The point is to nail down more explicitly what we mean by "what was
> > meant by each use of the URI".
> 
> This seems perfectly clear to me. What confusion do you think someone
> reading this might have?

I think much of this is moot now, as the document focus now seems to be
on the *mechanics* of providing and obtaining an authoritative URI
definition.

> 
> > If we're going to make progress on this
> > if we cannot be hand-waving about what we mean by "meaning".
> 
> I'm not hand-waving at all and I resent your describing it so.

Please do not take offense, as no offense is intended.  We're all trying
to figure this stuff out, and we may bring different styles, experiences
and assumptions to the table, but I think we are all doing so with
sincere intent to work collaboratively toward figuring it out.  We may
disagree on some things, and if so, I think it is helpful if we identify
what they are.  In this case, I was trying to stress what I believe is
the importance of making our assumptions explicit.

> 
> > We need to
> > be very explicit, and that's what I'm trying to do.  We need to make
> > *all* relevant assumptions explicit -- such as Carol's implicit rules ri
> > -- and we need to be talking about *observable* facts -- not what is
> > hidden in Carol's head.
> 
> I don't agree. People are very good at reasoning about what is in
> heads, and about correctness of applications and protocols.
> 
> >>
> >> Carol's problem is *not* caused by combining the graphs
> >
> > Huh?  But the problem does not exist if those graphs are not combined.
> > There is no contradiction if those graphs are not combined.
> 
> The question is not consistency, but correctness. There are lots of
> ways to be wrong without anyone detecting a contradiction.
> 
> If I tell you one day that mercury is the closest planet to the sun,
> and then the next that it is liquid at room temperature, you need to
> figure out what each occurrence of "mercury" means, even if you've
> forgotten the second day what I said to you the first day. It has
> nothing whatsoever to do with graph combination. It has to do with
> correct interpretation.

I disagree.  The whole notion of "correct interpretation" depends on
what I wish to *do* with the information.  What goes on inside my brain
-- whether I experience the color red they way you experience the color
green -- is irrelevant as long as we both stop our cars when the light
turns red and go when it's green.  "Correct interpretation" is
irrelevant if it cannot be state in terms of *observable* outcomes.
Perhaps we need to agree to disagree about this.

> 
> >>  - it is caused
> >> by Alice and Bob using the same URI in different ways. She would have
> >> to figure out what they mean
> >
> > Please be more explicit about what you mean by "what they mean".  What
> > RDF assertions will be made?  What *observable* action will occur?
> 
> The scenario has little to do with RDF. The question is what is being
> communicated and what knowledge the receiver will have after the
> communication. It doesn't matter that the sender and receiver are
> automata since they will have their own correctness criteria - with
> respect to *real* semantics, not hobbled RDF quasi-semantics - and
> will need to be subject to audit from agents who are not automata.
> 
> >> even if she didn't do any graph
> >> combining, if she processed the two graphs separately. In particular
> >> she'd be confused about whether to apply the IR reference rule or not,
> >> in either case.
> >
> > Carol's mental confusion seems irrelevant to me, because it is not
> > observable.  If her application produces the wrong output, then that is
> > observable, and we should talk about how and why that happens.
> 
> Carol writes lots of applications and says lots of things. If she's
> confused we may very well hear about it. This does not seem like a
> confusing point to me. It is just common sense.
> 
> > Can you translate Carol's confusion into her application producing
> > incorrect output, so that it is observable?
> 
> I probably could but I don't see the point. If the example included
> "likes" and the "application" had to make a list of information
> resources that were liked - something like that. But the main thing is
> that Carol (and the artifacts she's responsible for) mustn't say
> things that are not licensed by Alice or Bob, like that there exists
> an entity (either a canoe or a version or other IR) that has both the
> given title and the given mass.
> 
> >>
> >> The rest seems at best unnecessary to me; and as you know I find your
> >> "application" idea to be wrong and harmful as meaning is not a
> >> function of application.
> >
> > The "application" idea is merely a device to enable us to talk about
> > observable facts.  There may be a better way to do that, and if so we
> > could switch.  But I don't believe we can make headway on these topics
> > if we just make claims about unobservable beliefs that are in people's
> > heads.  We need to make the discussion absolutely explicit, concrete and
> > observable -- no "then a miracle occurs" steps.
> 
> I think we can make headway in exactly this way, and do.  This is how
> social processes work - you reason about what other people know. This
> is not miraculous or mysterious (any more than any other everyday
> human process).
> 
> >> Cases in which there is no problem due to
> >> some coincidence are uninteresting and don't need to be presented.
> >
> > Which cases do you mean?
> 
> The last three examples in your document

I guess you are referring to sec 5.5.3 Erin, 5.5.4 Frank and 5.5.5 Gail
in the attached.  I think it is very misleading to say that there is no
problem due to coincidence.  Their applications are engineered to work
correctly within a certain range of capability.  It isn't *coincidence*
that they do not step outside of that range of capability.

Imagine instead that their applications functioned correctly when given
data that happened to include some unneeded map data that modeled the
earth as flat.  Clearly the earth isn't flat.  But the fact that their
applications still functions correctly even when given some obviously
wrong (but ignored) data is not a *coincidence*, it is by design.

> 
> >>
> >> I had been focusing on how to construct an RDF satisfying
> >> interpretation (i.e. proof of soundness) in this case, but I think
> >> this is a secondary problem. The first thing is to figure out how
> >> Carol would reconstruct the intent, in the best of circumstances.
> >
> > What intent?  Can you state the problem in terms of observable output
> > that is incorrect?
> 
> I think this is silly. The receiver needs to distinguish between
> possible states of affairs known by the sender. It is obvious that if
> you have a properly functioning communication scenario involving a
> language with symbols {a, b} (among others), and then replace all b's
> with a's in each message, then there is a potential problem in that
> states of affairs that are distinguishable in the richer language
> might not be in the less rich language. Maybe there is no problem, but
> if there isn't that would have to be proved - you'd have to show that
> the replacement is harmless. It doesn't matter whether anything is
> observable or not - they could be talking about angels on pinheads,
> and the communication problem would be the same. It's a problem of
> information, not application behavior.

I disagree.  People have already accused us of pointlessly arguing how
many angels can dance on the head of a pin.  If we cannot state our
assumptions explicitly and frame the problem in terms of *observable*
outcomes then I think we are wasting everyone's time.

> 
> >> If
> >> she can do this then I'm sure there'd be some clever formal
> >> construction leading to an interpretation. If there weren't, well,that
> >> would make the case against this approach quite a bit stronger, but
> >> saying so doesn't help in presenting this option, and the first
> >> responsibility here is to give it a fair shake - we're not obligated
> >> to analyze it in detail, and doing so might even hurt socially.
> >>
> >> Remember this document is meant to bring people into conversation
> >> about issue 57. By going on and on we'd only scare people away. For
> >> this section, the people to be engaged would be Harry and Ed Summers
> >> and others who think this way. They are not formalists and already
> >> have little patience with careful analysis. They should not be
> >> bombarded with details.
> >>
> >> The presentation has to be as brief as possible - just long enough to
> >> enable them to recognize that this is the solution that they're
> >> proposing, while allowing us to describe the solution in terms used
> >> elsewhere in the document to make comparisons possible.
> >
> > Yes, I agree that when it comes time to present this, we need to make it
> > as succinct as possible.  But I don't think we're anywhere near being
> > able to do that yet.  I think *we* first need to come to agreement on
> > what the problem is.  Once we have done that in in enough detail to be
> > sure we have captured it, we can then try to simplify the presentation.
> > But thus far, I keep seeing too much hand waving and claims about things
> > that are not observable.
> 
> I don't know where I'm being unclear. As far as I can tell I have
> described the problem pretty well, and since you and I have been
> arguing along these lines for years, I have little confidence that
> *we* will ever agree. But if you can point to particular points that
> you think are unclear or are open to misinterpretation, that might be
> helpful as it may give me a chance to sharpen the prose.

I'll do my best.

thanks,
David


> 
> Jonathan
> 
> >> As I said I rewrote 5.5 last week. I just now fixed a couple of
> >> problems with and have tried to fix up a couple of things that might
> >> have confused you.
> >
> > I'll take a look at that also.  I haven't gone through your re-write
> > yet.
> >
> > thanks,
> >
> >
> > --
> > David Booth, Ph.D.
> > http://dbooth.org/
> >
> > Opinions expressed herein are those of the author and do not necessarily
> > reflect those of his employer.
> >
> 
> 

-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Attachments

text/html attachment: meaning-of-a-URI.html
Received on Monday, 11 April 2011 16:38:18 UTC