Re: Worry from Jonathan Rees on 2011-02-01 (public-awwsw@w3.org from February 2011)

From: Jonathan Rees <jar@creativecommons.org>
Date: Tue, 1 Feb 2011 17:46:06 +0000
To: nathan@webr3.org
Cc: AWWSW TF <public-awwsw@w3.org>
Message-ID: <AANLkTikad3xMrrnn=K3oqXWP3yG6p__ut5vTjtQPYWJQ@mail.gmail.com>
This is all good material to work through.  Dissent is good. If ever
I'm not clear please challenge me to explain. The subject is hard and
keeping track of definitions and goals is hard.

Certainly it would be difficult to convince a skeptical party (e.g. a
server programmer, or Ian H) that when a URI is used in the HTTP
protocol it is being used referentially, much less get them to answer
if asked what it refers to.  I'm not sure I believe this story myself,
in spite of the HTTP spec and webarch being written as if it were
true. Fans of webarch will insist that the URI does refer to (or
rather "identify", whatever that means) something, but that does not
seem a very useful thing to say.

This is why I've been careful to treat the URI's referential and
operational behavior quite independently - so that we can formulate
statements about whether or not any particular referential theory is
compatible with actual HTTP exchanges (i.e. whether the RDF author is
mistaken), according to a variety of theories.

If these URIs were not used in RDF at all there would be no problem.
But they are.

If you use a URI in RDF you are advised, by all the rhetoric
surrounding RDF, to use it referentially (although I don't know how
one would check that you did; maybe Pat can explain). A
natural-language explanation of what you meant by the RDF would
probably corroborate this, at least by its syntactic structure: you
would probably translate the occurrence of the URI as a referring noun
phrase.

I have no trouble saying that "resource" and "information resource"
are parts of a mere theory - that they do not "exist" (are not
independenly observable??) but rather are only "useful".  What are
they useful for?  First, it's possible the overall web/HTTP/RDF
architecture might actually make sense (I'm not sure), and they might
be part of some theory how web arch, and so on, work or could work
better. It would be nice to know that. Second, empirically we have a
vast number of extant metadata assertions that, if interpreted the way
RDF is usually interpreted, are "about" something (the "data" of which
the metadata is "meta"). If we are able to explain what all these
metadata composers are doing, and to what they're being held
accountable, we'll need a theory of their domain (indirectly a theory
of them). If calling the domain elements "information resources" is
confusing to you, great, let's pick a different word.

It's a fact that metadata stated in RDF is consequential. I would like
to know how and why. This is first a reverse engineering project,
second a best practices one.

It sounds like you want to say that URIs are not assumed to refer (in
RDF) to anything that affects more than one HTTP exchange. Under this
interpretation metadata has no consequences because the exchange is
unrecorded history - nobody has watched it, so you might be fibbing.
This seems unlikely to me because if metadata weren't falsifiable
(predictive) nobody would bother to write it. My view is that most
people (e.g. Google CC search) take metadata as predictive of future
exchanges, based on what they think are reasonable assumptions. These
two theories sound testable to me. Let's get to it.

At some point, of course, my appetite for empirical studies wanes and
I want to figure out how to fix things. If we discover that current
practice is so hopelessly inconsistent that there is no hope for
predictability or interoperability, or that it's heavily at variance
with published specifications, that's interesting. Maybe we have to
just stop using these URIs in RDF completely, trash webarch, and
surgically remove the web from RDF's namespace.

Jonathan

On Tue, Feb 1, 2011 at 3:30 PM, Nathan <nathan@webr3.org> wrote:
> Hi all,
>
> I feel like I may be a bit of a dissenter saying this, but I've got a
> horrible feeling that we're all focussed on, and looking at, the wrong
> things - at that relates to everything httpRange-14 related, accountability,
> resource vs representation, what does a uri name and so forth.
>
> I have to suggest that we are often focussed on what is the relation between
> a "resource" and a "representation" signified by an XXX status code on Y
> protocol - and that there is no such relation. Any XXX status code in a
> response relates entirely to the request, and not only that, but its the
> relation between the response from Y and a message sent by A to Y.

Yes, but in a larger context the response is interpreted to mean more
than that. That's like saying that when I say "hello" to you, you only
know that I said hello. In some sense that's true, but the "only" is
setting a pretty high bar for knowledge. You are likely to infer all
sorts of things from my saying that - that I know who you are, that
I'm not angry at you, that I don't have laryngitis, that I speak
English, etc.

> URIs are just names, we stick them in boxes on our computers and they hit a
> bunch of processes, possibly network intermediaries, origin servers,
> processes and databases on the other side and then back again. The relation
> between a URI and where a response comes from is pretty much unknown without
> looking at the messages, it could come from malware on the machine, a user
> agent cache, an intermediary, tampered with along the way and so on.
> Additionally the name can be dereferenced in virtually any way one can
> conceive, it's certainly not tied to a protocol indicated by a scheme (stick
> an http URI in a sparql query, or in wayback machine for an example of
> this).
>
> You can't get accountability from a uri to representation link, you can't
> apply a license in those terms; a license can only apply to the message sent
> back, and only be worth the paper it's written on if it can be proven that
> the message was sent from the correct place - if an image needs a license,
> stick it in the header field of the response message and have it applied to
> the representation contained.

This may be true in principle, because the specs don't enable this
accountability, but in practice, the accountability, however
ill-founded, exists. So I disagree as a matter of fact.

> Similarly, we just /can't/ define what a name refers to, it refers to
> different things for different people, example: "john". All names are an
> example of this, http uris are just the same, given a uri <x>, for one
> person that names "the representation they got back", for another it's the
> view of that representation as presented by a user agent, for another it's
> the concept over time "my paper" and for another it's the topic of that
> paper. The only things we can say, are that things have names, it's good to
> always use the same name to refer the same thing, and if you're sharing the
> use of a name with another party then it's good to agree on what you are
> referring to - we can't make that decision at web scale, it happens on a
> name by name business.

We do this on web scale all the time.
http://www.w3.org/1999/02/22-rdf-syntax-ns#type is an example. In fact
this is the whole point of the web.

If the global or web-based meaning of a URI isn't clear enough, or
none has been established, then sure, you need to say more in order to
communicate clearly. But in RDF at least, in order to be clear you
have to ultimately use global names. It has to ground out somewhere.

Jonathan

> Give different things different names, if you need to consider time,
> provenance, authority or accountability then look at the messages and those
> involved in passing the messages along.
>
> Apologies, it could well be out of scope, or maybe not so - but I had to get
> it off my chest.
>
> Best,
>
> Nathan
Received on Tuesday, 1 February 2011 17:46:39 UTC