Re: Editor's Draft of ISSUE-57 URI Usage Primer from Alan Ruttenberg on 2012-10-10 (www-tag@w3.org from October 2012)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Wed, 10 Oct 2012 13:51:48 -0400
To: Jonathan A Rees <rees@mumble.net>
Cc: Noah Mendelsohn <nrm@arcanedomain.com>, David Booth <david@dbooth.org>, Jeni Tennison <jeni@jenitennison.com>, www-tag <www-tag@w3.org>
Message-ID: <CAFKQJ8mFzhB+xFR7MSk+3RtFm1mHjYLFLK-je=ehctHet1HrXA@mail.gmail.com>
On Wed, Oct 10, 2012 at 1:16 PM, Jonathan A Rees <rees@mumble.net> wrote:
> On Wed, Oct 10, 2012 at 12:24 PM, Alan Ruttenberg
> <alanruttenberg@gmail.com> wrote:
>> On Tue, Oct 9, 2012 at 3:15 AM, Noah Mendelsohn <nrm@arcanedomain.com> wrote:
>>> I read the primer as providing useful advice for the many situations in
>>> which, for good or bad reasons, such separate URIs are not created.
>>
>> It doesn't read like this. A good way to present that would be to make
>> clear from the start that there are normative specifications about the
>> use of URIs for making assertions (such as the assertion of a property
>> value) and then give practical advise on how someone implementing
>> incorrect behavior can adjust their system to be within conformance.
>
> This is a good idea, but at present there *are* no such normative
> specifications. Someone would have to write one, in order to be sure
> that there was one. Is this what you had in mind? Maybe the 'URIs in
> data' document could provide one or two; but its current message is to
> ask others to go forth and write such normative specifications.

Where is that message expressed?

The document is labeled a "Primer" not a "request for proposals". It
starts with "This document describes the processing that applications
should perform when they encounter URIs in data. It describes how to
define data formats and publish information at URIs to enable
applications to understand how URIs within data should be
interpreted."

That doesn't look like a call for proposals, it looks like a tutorial
on how to do things.

> It would seem appropriate to provide the general format-independent
> advice separately from format-specific advice; the TAG is willing to
> take on the first, but not the second.

The format independent advise contradict specification by the formats
that have been defined in the W3C context. The document doesn't
suggest that a web page and a person are one and the same resource,
yet it suggests a method by which a URI can identify both. That's in
contradiction to what AWWW, RDF, and OWL say.

> For RDF, one could write a specification that said, for example, that
> it is correct to write
>   <http://example/x> :recent-representation-contained-word "frog".
> (or the same with a semantically indistinguishable URI) if and only if
> some HTTP request of the form GET http://example/x yielded a 200
> response where the content contained (after character decoding) the
> string "frog". This isn't easy to test, since it requires knowledge of
> the past, but at least it is objective.

The document isn't talking at that level. It isn't identifying
problems that need to be solved. It isn't instructing users to read
relevant specifications. It's presenting as tutorial instruction
"primer" while giving untested and advise that is not conformant with
existing specification when it explains how a URI can be used to
identify two things.

The ISSUE-57 ruling, despite the noise before and after, at least
touched lightly. I'll quote:

"
That we provide advice to the community that they may mint
"http" URIs for any resource provided that they follow this
simple rule for the sake of removing ambiguity:

   a) If an "http" resource responds to a GET request with a
      2xx response, then the resource identified by that URI
      is an information resource;

   b) If an "http" resource responds to a GET request with a
      303 (See Other) response, then the resource identified
      by that URI could be any resource;

   c) If an "http" resource responds to a GET request with a
      4xx (error) response, then the nature of the resource
      is unknown.
"

The only bit of unclarity is about what exactly an information
resource is. It's pretty clear that a car isn't. Nonetheless,
referring to other than documents with a URI was a relatively new
thing, without much practice or precedence, so it didn't disturb what
existed then very much. It gave a workable procedure for saying what
could be inferred by a HTTP response code (either very little, or
nothing).  Now, it would be nice to extend that finding with a formal
specification of further useful behavior. But the proposal in this
document, rather than moving in a direction that elaborates existing
precedence and specification instead moves backward by suggesting new,
incompatible, mechanism.


> I agree that something normative and concrete (objective, testable)
> would be helpful. Without an operational semantics developers will
> continue to be puzzled forever as to how to create conformant
> artifacts.

Huh? It is pretty easy to implement conformant artifacts. Do they tell
you all you want to know? No. But the conformance requirements is
rather low. The complaints about 303 and extra round trips seem like
nonsense to me. The *only* issue that I recall that should be
addressed are those cases where a user uses web hosting that doesn't
allow control over response codes. Many of the people who complain
about 303 are not those people.

I would be interested in suggestions regarding how to do
> this. Maybe something along the lines of
> http://www.w3.org/2000/10/swap/doc/Reach, which is pretty concrete.
> You can talk about "a graph serialized by a representation of the
> entity identified by URI x" and assert (normatively) that this means
> "a graph serialized by a representation retrieved [or retrievable,
> choice here] using the URI u" without agreeing on what u identifies or
> what "is a representation of" means.

This is behavior that goes beyond specification and it would indeed be
nice to have. This document doesn't go there, instead catering to a
retrograde issue that is the result of will, not lack of means.

Conforming behavior would be, for example, to always return 303,
redirect to some RDF that has an assertion of, at least, what type the
resource is, and to have browsers create their display via a
stylesheet. There are other conforming behaviors. The issue isn't that
it is puzzling, but that it is shouted loudly that this is
distasteful, unnecessary for their application, and takes an effort to
implement. To that I have little answer other than "grow up". And join
a working group to define something else that works.

> I think the hope is that providing (normative) definitions of
> "shorthand" and "immediate" (or whatever adjectives we end up
> choosing) will be enough to help out specifications to come. This
> would be similar to the way RFC 2119 defines "MUST" and so on for use
> in other specs. I don't know whether this approach will work; we need
> to probe this.

It doesn't work because it denies a premise of existing
specifications. At least come out and say so in the documentation, if
that is what is intended. I include below what I wrote to you,
privately, earlier:

---

Documentation, in the form of text, is useless for semweb purposes of
this sort, unless there is something in the model theory. For example,
the property "creator" is equally applicable as a "direct" property
and as a "shorthand" property. There are two possibilities, then. The first is
that you now need two "creator" properties, one for each use. In this case you
will not detect a contradiction unless the semantics of "shorthand"
and "direct" are made explicit. This is not addressed in the document.

The second is that you have one property, in which case you can't even use
globally applicable documentation, as is proposed in the document. Not
to mention that it contradicts the dictum that a URI denotes one
resource.

How do you say that a landing page is that of some entity, if they
have the same IRI. How do you say that they are not? How do you make
this work for sane people who correctly consider web pages as disjoint
from people? At the recent event I spoke at
(http://ncorwiki.buffalo.edu/index.php/Information_Infrastructure_Collaboration_Meeting)
virtually every speaker made note that the confusion between
information entities and what they are about was the root of serious
troubles in representing  their (uniformly complex) domains.

It is possible to have an entity that is defined as a composite of
both a web page and a person. But it needs a specific type (which is
neither web page nor person),  you need understand that there are two
underlying entities, have a documented way of relating them to the
composite, and semantic constraints that propagate values on the
composite onto the specific entity.

----

-Alan

>> It would present other proposals (such as the property punning) as
>> non-conformant and therefore damaging to applications that depend on
>> specifications. It would not propose new techniques that are at
>> variance with normative specification. It might explain some of the
>> consequences of specific non-conformant uses of URIs, such as when
>> they are intended to be ambiguous by sometimes referring to one
>> resource and sometimes to another.
>>
>> A primer doesn't show incorrect usage except to point it out as such.
>>
>> -Alan
>>
Received on Wednesday, 10 October 2012 17:52:45 UTC