Re: "RDF Knowledge" (Uniform HTTP Protocol for Managing RDF Graphs) from Gregg Reynolds on 2011-02-07 (public-rdf-dawg-comments@w3.org from February 2011)

From: Gregg Reynolds <dev@mobileink.com>
Date: Mon, 7 Feb 2011 08:35:27 -0600
To: Chimezie Ogbuji <chimezie@gmail.com>
Cc: public-rdf-dawg-comments@w3.org
Message-ID: <AANLkTimnQKsJO2vhDLic+edMSmughwjJdS6efC7Se+GZ@mail.gmail.com>
On Wed, Feb 2, 2011 at 10:12 AM, Chimezie Ogbuji <chimezie@gmail.com> wrote:

> Hello, Greg. Thanks for your comments, see the response(s) below (in
> context).  Note that I no longer have access to the email account
> where I originally received the comment and so I wasn't able to
> compose a reply with your original email quoted but will try to do so
> by hand where necessary.
>

Hi Chimezie,

Thanks for your response; comments below.  For reference my original post is
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2011Jan/0001.html


...

> On Wed, Jan 5, 2011, Gregg Reynolds wrote:
> > In summary: - In my view this document is unnecessary and possibly
> harmful.
>
> Can you elaborate how this document is harmful beyond the issues you
>
My text read "possibly harmful".

> have raised (presumably with the intent that they be addressed by
> modification to the text)? Are you
> saying that the text is not salvageable given its original aim, which
> is to define how the HTTP protocol can be used natively (i.e., in a
> manner consistent with the constraints of
> REST) to modify the graphs in a graph store?
>

If it only restates what is already defined elsewhere (e.g. RFC2616,
RFC3986, WEBARCH, etc), then it seems likely that it will only end up adding
to the confusion.  I've read it pretty carefully and I believe that is the
case.


>
> > I don't see what this document adds; or rather, I don't see what problem
> it addresses that is not already addressed elsewhere.
>
> There is no existing specification of how HTTP methods can be used to
> manage RDF graphs in a manner that takes immediate advantage of the
> semantics of the underlying
> protocol.


Sorry, I don't understand this.  What underlying protocol?  What semantics?
  I can't tell what you're talking about.  If you mean that HTTP is not
sufficient to support a RESTful architecture for RDF resource services, then
I respectfully but very strongly disagree.

But the more fundamental problem is that this kind of language - "manage rdf
graphs", "managing a graph store" (section 3) - is incompatible with REST.
 REST is about resources, not stores, not rdf, not graphs.  The document in
various places makes unmerited assumptions about resources; for example,
section 4: "In this way, an HTTP request can route operations towards a
named graph in an [sic] Graph Store via its Graph IRI."  In my view this
language is deeply unRESTful.  Requests do not route; IRIs denote resources,
not Graph Stores; there is no such thing as a "Graph IRI"; the kind of
server process standing behind an IRI is beyond the scope of the RESTful
interface.  The quoted sentence seems to be saying nothing more than that
IRIs denote resources, and it is up the the server to decide what exactly
that means.  But standard language based on the RFCs is perfectly adequate
for this.

Look at it from another perspective:  why do we not standardize a RESTful
protocol for managing JPEGs?  Or Word docs?  Or any other kind of data?  A
protocol for "managing graphs" is no different in principle than one for
"managing JPEG images".  We don't standardize the latter; why do the former?
 Or: it makes no difference if there is a massive distributed database farm
or a simple filesystem behind an IRI, and it doesn't matter what format is
used to store the data behind the IRI; from the client perspective, it's
just a resource, for which the server dispenses a token
(serialization/representation).  At the very least, none of these standards
docs should make reference to "Graph store", data store, data base, etc.
 They should stick religiously to "resource" or "graph resource".

Speaking only for myself, obviously, I've gone over the draft pretty
carefully and I see nothing there that is not a restatement of existing
standards.  The one possible exception is the definition of a query syntax -
?graph= and ?default - but I'm not even convinced these need to be
standardized.  Providers are free to define whatever query syntax they
please for their services; that just means different IRIs.  You say ?graph=,
I say ?g= - no big deal.  No different than controlling the path components
of your IRIs.  Standardizing a query syntax is no different than
standardizing a path for SPARQL endpoints; I don't think anybody would
advocate declaring that  SPARQL endpoints must be at /sparql.   Plus we
don't standardize a query syntax for SQL queries or requests for JPEGS, etc.
 What makes RDF special?



> The existing protocols (in a manner similar to SOAP
> interfaces) use HTTP POST to dispatch operations where the actions
> taken are defined by the content of the message rather than the
> semantics of the protocol (which specifies how resources are
> manipulated via the various methods: DELETE, GET, POST, etc.).


Again, I don't know what you're referring to.  What "existing protocols"?
 Can you provide a specific example illustrating failure of HTTP (including
standardized extensions) to support a RESTful API for RDF resources?  I
can't think of one.


> This
> protocol is meant to address this by defining a protocol that uses the
> constraints of REST to define how RDF graphs can be manipulated
> directly and natively in HTTP.
>

I strongly oppose this.  HTTP is already defined; so is REST.  Anybody who
wants to implement a REST architecture over HTTP to serve up RDF resources
just has to do a little research to figure out how.  Furthermore, whether
somebody wants to use a RESTful architecture or RPC or any other design to
implement RDF services is none of W3C's business.  Architectural patterns
are not an appropriate area for standardization in my opinion.  Would
anybody suggest publishing an MVC "standard"?

Overstandardization is not good.


> > "RDF knowledge". Please don't do this.
>
> After some discussion this term will not be used.


A grateful world breaths a sign of relief.  Thank you.
...

> > In my view the source of the problem is probably the notion that RDF
> somehow represents or encodes or [pick your verb]s "knowledge", which in
> turn is probably
> > traceable to the notion that IRIs "identify resources". Both ideas are
> fundamentally wrong in my view; RDF is about graphs, not knowledge,
>
> The term 'RDF Graph content' (although it doesn't use the word
> 'knowledge' which many found not helpful) does distinguish between the
> syntax (or structure) of an RDF graph and its meaning (or content).
>

Huh?  Not to my eye it doesn't.  First off, this sentence suggests an
equivalence between syntax and structure.  Syntax may have structure, and
graphs may have structure, but these are clearly different things.
 Sentences and other expressions have syntax; graphs do not.  If this isn't
clear, consider sets.  An expression like {1, 2, 3} has syntax; the set it
denotes has structure, but not syntax.  Second, to be honest I don't know
what "Graph content" is supposed to mean; it looks redundant to me.  A graph
is a graph; it has no "content".  Similarly, a set is a set; it has no
"content".  This is not the same as saying it has no members.  The point is
that you cannot postulate a third entity "content" that is distinct from the
set and its members.  Well, you can, but it would come as a surprise to
mathematics.

Maybe you can clarify what is intended by distinguishing between "graph" and
"graph content".  To me it looks just like "3" and "3 content" or "integer"
and "integer content".  In other words, adding "content" obfuscates the
issue.  I've done quite a bit of research into various ways of looking at
semantics and I can't recall every seeing anything corresponding to this
usage of the term "content".  Just to be clear, I do not take "graph" to
refer to syntax.


> This follows from the model theory of RDF, which provides a way for
> RDF graphs to be interpreted and there is an understood (as with other
>

Not quite.  It provides a way for expressions in an RDF language to be
interpreted.  RDF graphs are the things that are denoted.  Pretty major
difference.


> such formal languages such as in OWL for instance) distinction between
> the statements or sentences of a language and the meaning that they
> convey. Whether or not this distinction is problematic for RDF as a
> whole is not in scope for what this protocol intends to do and is
> probably better directed at the new RDF working group, perhaps.
>

One problem is that the language of the document is imprecise and therefore
confusing, as the above sentences illustrate.
...

> > Now suppose I declare "let x be the integer 3". Will anyone think it
> important to draw a distinction between "the integer and what x identifies"?
>
> This distinction is in fact very important to the use of model theory
> to specify (in a mathematical way) how the meaning of a formal
> language can be determined in order to facilitate machine
> understandability. For example, the RDF model theory does indeed
> distinguish between a URI reference and what it 'denotes':
>

I think you may have misread my example.  More explicitly, if I declare "let
x be the integer 3" (or "let x = 3"), I mean let the symbol x denote the
same value as the symbol 3, namely the third integer.  Then it is clearly
absurd to draw a distinction between "the third integer" and "what the
symbol 'x' identifies".  I brought this up in my original post because
language similar to this came up in the archives somewhere as "difference
between the graph and what the IRI identifies".  If we're talking
denotational semantics then it must be that an IRI that identifies a graph,
identifies a graph.  Your language suggests that you want to make a
distinction between graph and graph content.  Is that correct?  If so, as
argued above, I don't think such a distinction is valid.  Denotational
semantics is a binary affair: we have notation (syntax) and denotation
(semantics).  Introducing a difference between graph (which is the denotatum
of a graph expression) and graph content (which is ?) introduces a third
element that has no basis in accepted theory as far as I know.

 The semantics treats all RDF names as expressions which denote. The
> things denoted are called 'resources',
> It is not clear from my reading your comment if your issue is with the
> RDF model theory or if it is with some liberties that have been taken
> in the document you are commenting on. Can you clarify?
>

My beef with the MT stuff is not with the theory or the MT doc (although I
think it has major problems), but with the obscurity it introduces into the
exposition of RDF.  Model theory is arcane; 99% of the people who might be
interested in RDF will have no idea what it's all about.  I would much
prefer a set of SPARQL standards docs that make no mention at all of MT.  It
just isn't necessary.


>
> > Compare: "in using x to refer to the integer 3 in this way, I am not
> directly identifying the integer but rather the mathematical knowledge it
> represents."
> > You'd be laughed out of town.
>
> The distinction you are quoting before your statement above (between
> the what the IRI of a graph in a dataset identifies and the graph
> itself) is part of the SPARQL 1.0 specification (see the end of 8.2.2
> Specifying Named Graphs). Do you take issue with that part of the
> SPARQL 1.0 specification or with something unique to the specification
> you are commenting on?
>

I do take issue with that part of the SPARQL 1.0 spec, and anything else
that uses this kind of language.  Details below.


>
> > An IRI that is used to name an RDF graph refers to that graph,
>
> This is not the case. Please refer to the section of the SPARQL 1.0
> referred to above, which states:
>
>  The FROM NAMED syntax suggests that the IRI identifies the
> corresponding graph, but the relationship between an IRI
>  and a graph in an RDF dataset is indirect.  The IRI identifies a
> resource, and the resource is represented by a graph
>  (or, more precisely: by a document that serializes a graph). For
> further details see [WEBARCH].
>

Indeed, in my reading this passage is incoherent, or at least irredeemably
vague.  Also wrong.  An IRI identifies a resource; if that resource happens
to be a graph, then it identifies the graph.  "Graph" here means
mathematical object; it most definitely does not mean graph expression or
syntax or representation or serialization of a resource.

Frankly I'm at a loss to explain where language like this comes from.  Is
"graph in an RDF dataset" supposed to be special in some way?  I can't see
how.

The basic idea behind denotational semantics is pretty simple,
understandable by anybody capable of distinguishing between signifier and
signified.   If you have a collection of graphs and you name them, then the
names, well they name the graphs.  Why obfuscate such a simple concept?


> > To be honest I think the problem goes back to the use of model theory to
> provide semantics.
>
> This suggests that your issues have more to do with the (semantic)
> foundations of RDF rather than this particular specification. Is this
> the case and if not can you elaborate on the distinction?
>

Not so much the foundations as the language - I mean the language (or
metalanguage) of the standards texts.  It needs to be tightened up
considerably, and language from MT does not help.


> > I hasten to add that your text does touch on the critical issue, albeit
> only in passing. That is the issue of open-world semantics.
>
> This issue is beyond the scope of the protocol which only attempts to
> define how HTTP can be used to manipulate RDF graphs. Unfortunately, I
> had some difficulty following your description of an 'open graph' or
> how it is relevant for the intention of this specification.
>

I think one of the reasons texts about RDF tend to be hard for newcomers (I
make that assumption; it was certainly my experience) is precisely that they
don't make the meaning and implications of open world semantics clear and
explicit.  So in my view not only is it in scope, it is in a way central to
the whole endeavor.

A simple set-theoretic example will illustrate the point.  Consider a
statement like "S = {1, 2, 3}".  Then the symbol 'S' denotes a set of three
elements; the symbol '{1, 2, 3}' denotes the same set.  Under closed
semantics, we know that much about our symbols and their meanings.  But we
also have negative knowledge; we know, for example, that 4 is not an element
of (the set denoted by) S.  We know this because we get to use the Law of
the Excluded Middle:  4 either is or is not a member; since we do not have
positive knowledge that it is a member (no assertion), we get to infer the
negative, that it is not a member.

 However, under open-world semantics, we are not licensed to make such
inferences.  If asked, "Is 4 a member of the set S?", the best we can say is
"I don't know".  We don't get to infer the negative (not a member) from the
absence of the positive (assertion of membership), or vice-versa.

Consequently, the term "denotation" is in a sense inappropriate for
open-world semantics, since it must either have different meanings in open
and closed semantics, or it must be unable to fully account for meaning in
open-world semantics.  If the symbol G denotes a graph, then what exactly
does it denote?  Well, we obviously have the asserted and inferred graphs,
but we also have a third graph.  I don't know what to call it, but it
includes the asserted and inferred graphs, plus other triples that may have
been asserted or inferred elsewhere as elements of the graph.

Note that the distinction between asserted and inferred is not the same as
the distinction between extension and intension.  It might be tempting to
think of an inferred graph as the intensional sense of the asserted graph;
but unlike intensional senses, inferred graphs are constructable.  They can
be computed from the asserted graph plus a set of rules.  That is not the
case with intensions, which are not computable.  (I can't say nobody has
ever come up with a notion of intension that involves computability, but
I've never seen such a beast.)

Needless to say this complicates the picture of RDF semantics, and it's
pretty hard to come up with clear language explaining this stuff.  But I
think any standard that addresses the meanings of RDF (graph) expressions
should address it explicitly.  Unfortunately it's quite a thorny problem; in
fact I think it might be the same problem as providing a functional
semantics for IO.  Think of log files or database tables, whose semantics
must be open, since they vary over time.  With RDF by contrast, the problem
is not variance over time but incomplete knowledge - this is the one place
where the concept of "knowledge" (but not "RDF knowledge") is appropriate.

To summarize regarding the Uniform HTTP Protocol for Managing RDF Graphs, my
suggestions are

   - Withdraw it on grounds that it just restates existing standards and
   thus amounts to more of a Guide than a standard;
   - If it isn't withdrawn, tighten up the language to clearly and
   consistently distinguish between references to syntax and semantics, and
   eliminate language suggesting a third component to denotational semantics
   (e.g. eliminate the "graph" v. "graph content" distinction).


Sincerely,

Gregg
Received on Monday, 7 February 2011 14:36:00 UTC