about proposal 27 from Jonathan A Rees on 2012-10-10 (www-archive@w3.org from October 2012)

From: Jonathan A Rees <rees@mumble.net>
Date: Wed, 10 Oct 2012 18:17:40 -0400
To: www-archive@w3.org
Message-ID: <CAGnGFM+FxvBZ+zkybUV8x68dyx4JJn=D8KiPQELVzXEeDEvJyA@mail.gmail.com>
To whom it may concern:

This is yet another assault on describing the "proposal 27" design,
which I believe to be consistent with what Henry and Jeni and I have
been working on; responding mainly to Alan R's criticisms of
Jeni's draft
http://www.w3.org/2001/tag/doc/uri-usage-primer-2012-10-03/
, see
http://lists.w3.org/Archives/Public/www-tag/2012Oct/0068.html
and following.

I'm getting kind of sick of this.  I expect it shows.  I assert that
proposal 27 (which Jeni's document valiantly attempts to capture in a
way that I, JAR, have a hard time doing) can be formulated rigorously
and that the result has a chance of being useful and is therefore
worth testing.  Formalism is a way to fight back against the
ungenerous.  I do not want to use it, but be warned I will if backed
into a corner.

Let's make sure the mechanism is understood, before further
criticizing the description of it.

Since terminology is the bugaboo (see nearly all discussions on this
topic), and precipitates such boring fights, I must proceed, until
sensible consensus prevails, without using any of the nasty usual
words, the ones with distracting existing connotations.  The labels
don't matter in the end, and we can easily change them before
publication.  What's important is the machinery.  I'll introduce
suggestive but - I hope - connotation-free words for the purpose of
explaining the logic.  I am the authority on the new words introduced
here, so if you have a question about what they mean, ask me.  Do not
assume anything I wouldn't.  Don't ask me to use different words like
"denote" or "resource".  JUST STOP BUGGING ME ABOUT THE LABELS.  THEY
DON'T MATTER UNTIL WE GO TO LAST CALL.

We have URIs.  I introduce two transitive verbs "wdentify" and
"tdentify".  Sometimes URIs wdentify, and sometimes they tdentify, and
sometimes they do both or neither.  So: wdentify and tdentify are
relationships between URIs and other things.  They will be explained
further as we go along.

Each URI wdentifies at most one thing, and tdentifies at most one
thing.

There is no reason to stop at wdentification and tdentification; there
could be zdentification and so on.  But nobody's asking for those yet.

== Semantics of wdentification ==

Short version: A URI wdentifies its webpaige.  A webpaige hath zero or
more rxpresentations, specifically the ones you GET when you GET using
a URI that wdentifies the webpaige.  SKIP THIS WHOLE SECTION IF THAT'S
GOOD ENOUGH FOR YOU.

There are rxpresentations.  They can be encoded in bits.  They can be
carried by HTTP messages.  If you look at an HTTP message (request or
response), you can pick out the rxpresentation it carries: it is
pretty much what RFC 2616 calls the "entity".  Rxpresentations have
content (perhaps null) and headers (perhaps none).  We can make the
ontology as detailed or vague as you like.  (Do I need to go further?)

Rxpresentations have properties such as: Does the string "frog" occur
in it?  What RDF graph does it serialize, if any?  (modulo blank node
identity that is.)  We could pretty easily express information of this
sort in RDF, and might have to, using a standard vocabulary, if
issue-57 is to have a complete solution that can satisfy all critics.

At any given time it may or may not be the case that a given webpaige
"hath" a given rxpresentation.  (The domain of "hath" is webpaiges and
the range is rxpresentations.)  If the rxpresentation is the outcome
of a successful retrieval (see RFC 3986) using a URI that wdentifies
the webpaige, and the retrieval is "authoritative" per HTTPbis, and
the Expires: time in the rxpresentation has not passed, then the
webpaige hath the rxpresentation.  Basically, the webpaige hath a
rxpresentation if a cache could correctly deliver the rxpresentation
in response to a retrieval request.

(An HTTP GET request is a retrieval request.  Other requests are not.
An HTTP 2xx response is a successful response.  4xx and 5xx responses
are not.  I do not want to talk about 3xx responses yet.  (Do I need
to?))

(This can be made more rigorous, by going into HTTP's caching rules in
more detail, diving into "correctness" and authority and speech acts
and deontic logic.  Dan Connolloy has made a good first cut.  Please
don't make me do it!  It's very tiresome.)

There may be other situations in which a webpaige hath a
rxpresentation, in addition to the cases where the rxpresentation has
been delivered in response to a retrieval request, and where it's
cachable; informally this would be if the server has the disposition,
at the current time, to deliver it in response to some retrieval
request.  (Do I need to go into this?)

It is not clear when the relationship "hath" does *not* hold between a
given webpaige and a rxpresentation.  That is, we need to rule out
interpretations in which "hath" is the top property.  There are cases
in which a person would say that displaying a particular
rxpresentation where the request URI

There is no way to communicate "x does not hath" in the HTTP protocol,
or otherwise to test for it objectively, although cache invalidation
rules come close.  This is one of the tragedies of the architecture.

And how do we know, given an arbitrary URI U and an arbitrary webpaige
W, whether or not U wdentifies W?  We simply don't.  That is a matter
of interpretation.  Another tragedy of the architecture.

We can discuss properties of webpaiges by reducing them to the
properties of their rxpresentations.  That is: some rxpresentation of
the webpaige has property P, all of them do, some will, all past ones
have, etc.  For example, the class of webpaiges W with the property
that W "hath" some rxpresentation R that serializes an RDF graph
containing the URI U, is fairly well defined (i.e. objective) given U.

OK, I hope this gives enough on these relationships and their
pragmatics to proceed.

If you like the httpRange-14 resolution (not all of you do), you can
take "wdentification" to be "identification".  But please don't do
that at least until you've read the whole thing through.

== Semantics of tdentification ==

Put concisely: a URI tdentifies what the webpaige it wdentifies says
that it tdentifies.  SKIP THIS SECTION IF THAT'S GOOD ENOUGH FOR YOU.

We can be more pedantic about this, as follows:

Let the recommendation in question fix a URI G*; for example, G* might
be the URI http://www.w3.org/2001/tag/2012/09/issue57/infra#Gstar .

Suppose there exist
  U, a URI
  W, a webpaige
  R, a rxpresentation
  G and G', RDF graphs
  P, a URI
  y, an RDF term
satisfying the following:
  U wdentifies W
  W hath R
  R serializes RDF graph G
  G' is an "adequate supergraph" of G (that is, it pulls in enough
    additional axioms via owl:imports, follow your nose, or some other
    axiom source to meet the purposes at hand, especially in regard to
    axioms for properties)
  A statement <U> <G*> y. is entailed by G'
  I is a satisfying interpretation of G'
  I is "acceptable in context"
  I maps y to X

Then U tdentifies X.

What does "acceptable in context" mean?  Basically it means the
rdfs:comment and rdfs:label properties are respected; or that I is the
"intended interpretation".  In other words the choice of I is not
completely arbitrary - we want any given URI to tdentify only one
thing.

Do I need to elaborate?  Is the lack of objective criteria for
acceptability fatal?  Does the algorithm for extending G to G' need to
be nailed down?  (It could be the identity.)  I confess I find this
confusing and unsuitable for normativity.  It is related to the
general problem of what vocabulary conformance is.  But in practice
people don't seem troubled by this.

There might be ways out of the difficulty, basically by combining G'
with some other RDF graph found in the immediate vicinity of the
entity whose comformance to recommendation is under consideration.

I am willing to work with others on making this better.  If you fight
me and say it doesn't make sense, I will fight back, but you will have
to start engaging me on requirements and so on.

If you hate the httpRange-14 resolution (not all of you do), you can
take "tdentification" to be "identification" or some variant on it
(e.g. what do you do if there's no rxpresentation that has serializes
a graph that contains U?).  But please don't do that at least until
you've read the whole thing through.

== Identification ==

So far all we have is definitions and constraints, nothing normative.
Now for the normative part.

It is a premise of the proposal 27 exercise that there will never be
agreement on what hashless http: URIs identify (or refer to, or
denote, or name, or anything else of that sort).  If you don't like
that premise go prepare your own $%#& proposal!!  However, we can
recommend that the community observe *constraints* on identification
as follows:

  If U identifies x, and I wdentifies w, then x and w are related by
  F* (or I should say, by what F* identifies - give me a break already).

  If U identifies x, and I tdentifies t, then x and t are related by
  G*.

  F* and G* identify functional object properties.

It would be possible to formalize this, modulo squishiniess of
"wdentifies" and "tdentifies" as above, and the formalization hurdle
can be overcome if those who complain about the squishiness will
engage with me and not take potshots.

A specification that is normative on interpretations of a vocabulary
MAY reference this specification normatively, in which case
interpretations of the vocabulary are further constrained by the
above.

(Note that there are to date no common practices regarding conformance
of artifacts to vocabulary specifications.  But I don't think this
will hold anyone up - will it?)

In writing documentation it will be useful to have a way to express
the relations F* and G* both in prose and in RDF.  For RDF we can just
pick URIs.  But the terminology is a bugaboo.  The way to make the
documentation easy to write and read would be to have a role-noun
form, see http://www.w3.org/wiki/RoleNoun.  Suppose that "handypaige"
and "handything" do not have distracting connotations.  Then, using
the widespread role-noun pattern, which I personally find unpleasant
but use in order to flow with the way most people seem to do things in
RDF, we have:

  F* rdfs:label "handypaige".
  G* rdfs:label "handything".
  Q:creator rdfs:label "handypaige's creator".
  R:creator rdfs:label "handything's creator".

- This proposal does not threaten "unique identification" in the RDF
  semantics sense.

- We do not have "unique identification" in the sense of global
  interoperability per AWWW, so this proposal doesn't make matters
  any worse than they already are.

- Having agreed on this proposal, we could in theory agree on even
  more constraints in order to approach the AWWW golden unachievable
  ideal of global interoperable unique so-called "identification".

== Giving it more teeth ==

Define a "disputed URI" to be a hashless http: URI.

A conforming document MUST NOT use a disputed URI in the subject
position of a statement unless it is entailed, either formally or
informally, that the relation in the statement factors on the left
through either the "handypaige" or "handything" relation.

A conforming document MUST NOT use a disputed URI in the subject
position of a statement unless it is entailed, either formally or
informally, that the relation in the statement factors on the right
through either the "handypaige" or "handything" relation.

"Formally entailed" means entailed per applicable formal semantics
(RDF or OWL).  "Informally entailed" means per common sense after
reading all the applicable documentation.

Please don't make me go into the gory details about factoring through.
I just mean what I talked about above.  I'm really tired of this.


== JAR's desire ==

I seek the following additional normative constraint, but JT and HT do
not yet understand it:

A conforming document with a consistent interpretation MUST NOT be
inconsistent with the assumption that identification is
wdentification.

(From which it would follows that F* and owl:sameAs denote equivalent
properties.  We're not forcing this on everyone; just permitting it
for those who want it.)

A symmetric requirement for tdentification is not possible, since that
would lead to contradictions.

This has nontrivial consequences for identity (i.e. owl:sameAs and
owl:differentFrom), but for the audience we're concerned about the
requirement won't make much of a difference (they're already
hopelessly confused about owl:sameAs, and mired in inconsistencies).

It doesn't even matter that I am now confident that identification =
wdentification is the only plausible interpretation of what RFCs 2616
and 3986 say. This is why Roy signed his name to the httpRange-14
resolution. NOBODY WILL EVER BELIEVE ME.

== Unfinished business ==

We would *like* for the following axioms to hold:

  If U is not disputed (as defined above), then what U wdentifies =
  what U tdentifies = what U identifies.  (This covers the case of
  interoperability of hash URIs with properties that factor on the
  left or right through handypaige or handything.)

  If U xdentifies a webpaige that has no rxpresentations (as described
  above), then what U wdentifies = what U tdentifies = what U
  identifies.  (This covers the case of interoperability of 303 URIs
  with properties that factor on the left or right through through
  handypaige or handything.)

Before we can say this, we need to convince ourselves that these
axioms are consistent with the kinds of interpretations we'd like to
have.  This is not obvious, but neither is it obvious that there is a
problem.
Received on Wednesday, 10 October 2012 22:18:07 UTC