Re: sketch of an exposition from David Booth on 2010-05-22 (public-awwsw@w3.org from May 2010)

From: David Booth <david@dbooth.org>
Date: Fri, 21 May 2010 23:13:48 -0400
To: Jonathan Rees <jar@creativecommons.org>
Cc: AWWSW TF <public-awwsw@w3.org>
Message-ID: <1274498028.2440.6967.camel@dbooth-laptop>
Hi Jonathan,

Nice write-up!  My answers below.

On Mon, 2010-05-17 at 17:40 -0400, Jonathan Rees wrote:
> Apologies up front:
>   - sorry it's rough and unformatted.  I'm trying out expository ideas
> and terminology & wanted to get this out to you all for critique
>   - topic not covered: metadata subjects (DC, FRBR, etc.); redirections
>   - tell me which statements you disagree with! we thrive on
> statements that are interesting enough that one can argue over them.
>   - idle question: does every IR have a REST-representation?
> 
> -Jonathan
> 
> -------------------
> 
> Axiomatic method = don't take anything for granted - if some result
> can't be proved from axioms already stated, do not assume that it is
> true.

Okay, I guess.  We'll see.

> 
> Assume a universe of discourse, which I'll call Thing.

Yes.

> 
> In formal treatments one needs a way to refer to (or name or
> designate) things.  For this purpose we may use URIs, although other
> notations may be useful too.

Yes.

> 
> Reference is not objective; when a URI refers to a Thing it's
> because someone has chosen to have it do so.

In the large picture, that certainly is true.  But one key purpose of
semantic web architecture is to give guidance about what one *should*
chose to have a URI denote.

> 
> Reference does not imply any special knowledge of a Thing.  

Yes!

> I can
> talk about a thing without knowing exactly which thing I'm talking
> about - for example, I might be communicating partial knowledge
> (properties) that I received from someone else.  Reference is not
> "identification".

Yes!

> 
> We'll suppose that (in any given conversation or context) a URI refers
> to at most one Thing.  

No.  :(   

There are three important myths that need to be dispelled:

  Myth 1: There is only one giant RDF graph.

  Reality: There are *many* graphs.  RDF semantics only applies to a
*given* graph.  A URI may denote a *different* thing in graph G1 than it
denotes in graph G2.  (See myths #2 and #3.)  However, it *is* possible
to talk about the *range* of things that a URI might denote in *any*
graph.  That range is constrained by the assertions that are made in the
definition of the resource that the URI denotes, i.e., in what I've been
calling the "core assertions" of its URI declaration:
http://dbooth.org/2009/denotation/ 

  Myth 2: There is only one interpretation of a given graph.

  Reality: In general there are many, and this means that a given URI in
a given graph may denote one thing in *one* interpretation, but a
different thing in a *another* interpretation.  See
http://www.w3.org/TR/rdf-mt/#interp 

  Myth 3: The identity of a resource denoted by a URI can be uniquely
defined.

  Reality: Perhaps it can in some *very* limited cases, but in general
the identity can only be constrained by providing some assertions that
constrain the identity.  This means that in general a given URI in a
given graph may *ambiguously* denote several things.  It is the
interpretation you choose that determines which thing you get, as
illustrated in figure 2:
http://dbooth.org/2009/denotation/#figure2 


> An agent may take a URI to refer to no Thing at
> all, or refer to a Thing by multiple URIs, or not take any URI to
> refer to some Thing.

Okay.

> 
> If a URI U refers to some thing T then <U> is another name for T.

Yes.

> 
> Some Things will be what we call 'REST-representations'.
> 
>    For now think of them as being similar to HTTP 'entities' - they
>    consist of content and a few headers such as media type.
>    But we'll figure out the details later.
> 
>    We don't assume that these REST-representations are 'on the wire'
>    or associated with particular events or messages.
>    We reserve the right to refer to them using URIs, but generally
>    this will be unnecessary.

Yes.

> 
> Posit a relationship, which I'll call 'W', between some Things and
> some 'REST-representations' e.g. W(T,R).
> 
>    The intent is for W to capture what gets written variously
>      R is "an entity corresponding to" T (RFC 2616 10.2.1)
>      T "corresponds to" R (RFC 2616 10.3.1)
>      R is a representation of the state of T (Fielding and Taylor)
>      R "encodes information about state" of T (AWWW glossary)
>      R "is a representation of" T (AWWW 2.4)

Yes, sounds good.

> 
>    We permit the same REST-representation to be W-related to multiple
>    Things, i.e. W(T,R) and W(T',R) is consistent with T != T'.

Yes.

> 
>    We permit one Thing to be W-related to more than one
>    REST-representation, i.e. W(T,R) and W(T,R') is consistent with
>    R != R'.

Yes.

> 
>    If you don't accept web architecture as expressed in RFC 2616 in
>    its rudiments, you should stop reading here.

:)

> 
> Let us stipulate that a GET/200 HTTP exchange expresses a
> W-relationship between a Thing and a REST-representation.  That is:
>   1. If a URI U refers to a Thing <U>, and
>   2. an HTTP request GET U results in a 200 response carrying
>      REST-representation R, then
>   3. we will interpret the exchange as communicating W(<U>, R).

Yes.  However, under the httpRange-14 rule, it also says a bit more, but
we'll come to that later.

> 
>    WHETHER WE CHOOSE TO BELIEVE W(<U>, R) IS ANOTHER STORY.
>    (Consider a buggy or malicious proxy.  HTTPbis starts to address
>    believability by trying to specify a notion of 'authority'.)
>    ISSUES OF TRUST AND AUTHORITY WILL BE TREATED SEPARATELY (if we get
>    around to it).
> 
>    We might fudge this by speaking of "credible" HTTP exchanges without
>    saying exactly what that means (as indeed one cannot say).

Good.

> 
> The implication goes in only one direction: a credible GET U/200 R
> exchange implies W(<U>, R), but the absence of such an exchange does
> not imply that W(<U>, R) is not the case.

Right.

> 
> In fact there may be other ways to communicate or infer W(<U>, R) -
> by consulting a cache, for example.

Right.  Or someone you believe may have told you, for example.

> 
> A consequence (or precondition) of this stipulation is that for each
> URI U for which there is a GET/200 exchange, there exists a Thing <U>
> that U refers to.  Roughly speaking, all web URIs refer to
> *something*.
> 
>    This is the way in which the web is "grandfathered" into the
>    semantic web.
> 
>    Although it's not falsifiable, this seems to be the idea that IH
>    denies (there are no resources).

Yes.  But who/what is "IH"?

> 
> This is a powerful constraint.  Since servers are "authoritative",
> they can produce whatever 200 responses they like for a URI that they
> control, and not violate protocol.  That is, for an *arbitrary* set of
> REST-representations concoctable by a server, we've committed to
> allowing the existence of a Thing that has those REST-representations.

Sounds fine to me.

> 
> Note on what is NOT provable at this point
> 
>    We haven't created a way to falsify any W-statement.  That is,
>    there is no way to infer the W(T,R) does not hold.  Therefore this
>    theory is satisfiable by having a single Thing T, that all URIs
>    refer to, having the property that W(T,R) for all
>    REST-representations R.

Right.  You don't have any functional relationship between requests,
time and REST-representations yet.

> 
> Note on time
> 
>    Although W is time-sensitive, we'll ignore time as it is not
>    helpful to account for it right now.  later we'll redo the
>    treatment to take time into account.
> 
>    So W is OK as a binary relation for now.  Later it might be
>    W(X,R,t).

Okay.  But the other thing you'll need is the request.  Once you
consider *both* time *and* request, the relationship becomes functional.

> 
> Note on RDF
> 
>    RDF is just a vector for FOL, and FOL is a lot easier to read and
>    think about, so better to start with FOL and then render it in RDF
>    (and perhaps other notations) later on.

I suppose that's a personal preference.  I happen to be more comfortable
with n3 notation, but hopefully it won't matter.

> 
> No number of GET/200 exchanges can tell you what a resource is.

Right.  Not unless you get lucky and happen to receive some content that
tells you, *and* you choose to believe it.

> There are several reasons for this.
>   1. The absence of a GET/200 giving W(T,R) does not mean that W(T,R)
>      isn't true.

Right.

>   2. Two Things T,T' could have W(T,R) but not W(T',R) for some
>      REST-representation R not hitherto obtained by a GET/200 exchange.

Right.

>   3. T and T' could agree on the truth or falsehood of *every*
>      W-statement and *still* be different

Yes!

> 
> Information distinguishing such Things, if it were available, would
> have to come through a different channel (e.g. RDF).

Right.

> 
> httpRange-14
> ------------
> 
> Let IR be a proper subclass of Thing containing the domain of W,
> i.e. suppose W(T,R) implies that T is in IR.

Okay, but IR need not be a *proper* subclass.  In fact, the domain of W
should be defined as T, as explained here:
http://dbooth.org/2007/splitting/#httpRange-14

> 
> Properties of IR:
>    Grandfathering: "web resources" (those for which we get 200s) are in IR
>      - this is a consequence of the above stipulation.

Yes.

>    TimBL: "generic resources" are in IR (genont)
>    TimBL: literary works are in IR  (Pat Hayes disagrees)
> 
>    TimBL: dogs and people are disjoint with IR
>      (by extension: anything physical)
>    TimBL: strings and numbers are disjoint with IR
>      (by extension: anything mathematical)
>    TimBL: REST-representation is disjoint with IR
>      (JAR doesn't see the point)
>    Pat: RDF graphs are not in IR
> 
>    TimBL: members of IR are not determined by their W-relations
>      i.e. one might have W(T,R) = W(T',R) for all REST-representations
>      R, yet T != T'   [time sheet example]

Again, I think you're on the wrong track by trying to nail down the
boundaries of what is or isn't an IR.   As explained here
http://lists.w3.org/Archives/Public/public-awwsw/2010May/0016.html
whether a URI denotes a resource that is an IR (in addition to whatever
else it might be) is a matter of *choice*.  The statements in AWWW that
suggest that IR *is* disjoint with dogs and people should not have been
written quite that way.  

> 
> We have three theories of IR in the works now: Dan's speaks-for
> theory, Alan's what-is-on-the-web theory, and JAR's property-transfer
> theory.

Plus DBooth's: An IR is a *role* in the architecture.  As a class, IR
should not be defined as being disjoint with *anything*.  *Any*
URI-denoted resource can act in that role.  

If you configure your server to yield a 200 response to a GET on URI U,
then your server has said that <U> is an IR, but this does not prevent
<U> from also being something else, such as an RDF graph, a number, a
person or a dog.  However, to avoid likely URI collisions
http://www.w3.org/TR/webarch/#URI-collision
one *should* avoid using the same URI for things that people are likely
to wish to differentiate, such as distinguishing between a dog and a web
page *about* the dog.  Or perhaps a better example would be a book
versus a web page *about* the book, because one may well wish to make
similar-sounding statements about them (hasAuthor, dateWritten, etc.),
so minting one URI for both may well cause annoyance when others try to
use your URI.  But ultimately it is up to the URI owner to make this
choice, and the right choice will depend on the application.


-- 
David Booth, Ph.D.
Cleveland Clinic (contractor)

Opinions expressed herein are those of the author and do not necessarily
reflect those of Cleveland Clinic.
Received on Saturday, 22 May 2010 03:14:17 UTC