"RDF Knowledge" (Uniform HTTP Protocol for Managing RDF Graphs) from Gregg Reynolds on 2011-01-05 (public-rdf-dawg-comments@w3.org from January 2011)

From: Gregg Reynolds <dev@mobileink.com>
Date: Wed, 5 Jan 2011 06:59:11 -0600
To: public-rdf-dawg-comments@w3.org
Message-ID: <AANLkTikmnVFEng6MpxyvJkQtQ63QyB3U7MzZ7R5MVqHk@mail.gmail.com>
Hello,

Here is some feedback on SPARQL 1.1 Uniform HTTP Protocol for Managing RDF
Graphs <http://www.w3.org/TR/2010/WD-sparql11-http-rdf-update-20100126/>.  I
was on a WG once so I know it's a difficult and largely thankless task; I
appreciate your work, even though I don't always agree with the results.
 Also, please don't take it personally if I use plain language; I presume
that you value honest feedback over delicacy of expression.

I basically agree entirely with Ian Davis'
comments<http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2010Dec/0005.html>,
but I would go a good bit further.  In summary:

   - In my view this document is unnecessary and possibly harmful.
   - "RDF knowledge".  Please don't do this.

"This specification applies the HTTP protocol semantics in managing and
modifying RDF graphs."  HTTP, RDF, and SPARQL are already (relatively)
well-defined.  I don't see what this document adds; or rather, I don't see
what problem it addresses that is not already addressed elsewhere.  Indeed I
think it likely to increase confusion.  More on this in another post.

Regarding "RDF knowledge": in my view introducing this phrase as a technical
term is a very bad idea.  "Knowledge" is not a technical term, I trust that
much is obvious (show me one widely accepted and clear definition if you
disagree), and I think it's also obvious that trying to impose a constrained
technical sense on a common natural language term is a bad idea.  It doesn't
matter how well you "explain" your intended meaning; the fact that you are
trying to appropriate an ordinary term for a specialized purpose will just
annoy your readers, who will come to the text with well-established (read:
stubborn) if informal notions about what the word "knowledge" means and how
it should be used.  Referring them to obscure papers in a minor engineering
field will really annoy them, take my word for it. ;)  The W3C is not an AI
lab or a philosophy department; any standards it publishes should stick to
minimal, pragmatic definitions using well-established and widely-accepted
concepts and terminology.  They should definitely not require researches in
the arcana of AI.

The problem is deeper than just finding the right terminology.  I've read
some of the correspondance on the mailing list and I understand the
motivation behind the term.  But I think the fact that "RDF knowledge" is
the best you've come up with is a pretty good indication that the idea you
want to convey is not well thought out.  I gather the idea is that you want
to draw some kind of distinction between graph and IRI - "the difference
between the graph and what the graph IRI
identifies<http://lists.w3.org/Archives/Public/public-rdf-dawg/2011JanMar/0011.html>".
 I suggest you rethink this - again, the plain obscurity of this language is
a strong indicator that something is wrong.
In my view the source of the problem is probably the notion that RDF somehow
represents or encodes or [pick your verb]s "knowledge", which in turn is
probably traceable to the notion that IRIs "identify resources".  Both ideas
are fundamentally wrong in my view; RDF is about graphs, not knowledge, and
the IRI may or may not identify something (just because TBL et al. used
"identifier" in the name does not make it an identifier *of* something).
 "Resource" is just a way of saying "we can't think of a better term".

I expect this argument will be viewed with considerable skepticism in the WG
and the RDF community in general.  I've posted a few blog
notes<http://blog.mobileink.com/>that might help clarify what I'm
trying to get at if you're interested.  A
simple example might help. Consider the following simple analog to the kind
of language we're discussing.  Whatever else it may be, RDF is a formal
language, just like any other formal language, including (loosely)
mathematical notation.  Now suppose I declare "let x be the integer 3".
 Will anyone think it important to draw a distinction between "the integer
and what x identifies"?  Or take this sentence from the draft (the same one
that Ian Davis found troublesome):  "In this way, an HTTP request can route
operations towards a named graph in an RDF dataset via its URI(s). However,
in using URIs in this way, we are not directly identifying the RDF graphs
but rather the networked RDF knowledge they represent."  Compare:  "in using
x to refer to the integer 3 in this way, I am not directly identifying the
integer but rather the mathematical knowledge it represents."  You'd be
laughed out of town.

In other words, the distinction between the formalism (i.e. mathematical
object) and the "knowledge it represents" is, frankly, not only pointless
but harmful insofar as it implies some kind of special meaning that is just
not there.  There is nothing special about RDF.  RDF is about graphs, not
knowledge; and a graph is a mathematical object, just like an integer or an
algebra.  Introducing an additional level of semantics - the meaning of the
mathematical object - just adds confusion and complexity, without any
benefits that I can see.  An IRI that is used to name an RDF graph refers to
that graph, just like the symbol "3" refers to the integer so-designated
(and not the "knowledge" that is represented by the integer that is
serialized by the symbol etc. etc.)

To be honest I think the problem goes back to the use of model theory to
provide semantics.  Just because one can construct a model-theoretic
semantics for a language does not mean one should.  In fact virtually all
formal languages - including in particular the notations of working
mathematicians - get along just fine without formal semantics.  Godel's
theorems did not bring mathematics to a halt.  The only programming language
I can think of with a formal semantics is Standard ML, and my guess is that
nobody bothers to read the definition.  It just isn't necessary and it's
hard to read.  The Z Specification language provides a formal semantics for
ordinary ZF notation, and it's very well done, but it hasn't exactly taken
the world by storm.

So the fact that we can provide a model-theoretic semantics for graph theory
is not very useful, at least not for this kind of document.  I have several
introductory texts on graph theory on my bookshelf; not one of them contains
even a hint of formal semantics, but they're all perfectly understandable.
 It's enough to know (or be convinced) that one *could* construct a formal
semantics.  The point being that introducing the machinery of model theory
in documents about how the language works is not helpful for most readers.

Having poured out part of my complaint about "RDF knowledge" (I could go on)
I hasten to add that your text does touch on the critical issue, albeit only
in passing.  That is the issue of open-world semantics.  This is an area
where I think a minor terminological innovation may make sense.  I propose
the term "open graph" as a means of capturing the nature of RDF graphs.

The motivation comes from ordinary mathematics; e.g. the open interval
(0..1) (alternative notation:  ]0,1[ ) defined as 0 < x < 1, and similar
uses of the terms "open" and "closed" in topology etc.  The key idea here is
that you cannot write down a finite extensional representation of such
mathematical objects; you can only write down approximations.  Or think of
irrationals defined in terms of bounded open sets; you can only approximate
the square root of 2.  This suggests an innovative way of looking at
open-world RDF graphs.  Given a suitable definition of "open graph" on the
analogy of open interval or open ball, every RDF graph is open; that is,
infinite.  The graph you retrieve from a web server is only a finite
approximation.  (I omit the graph v. representation distinction on grounds
that it is obvious.)  One could go further, and declare that every concrete
RDF graph is an approximation of the One True Graph, in which everything is
connected to everything.  Or one could say that the graph represented in
your (finite) computer, now, is a closed (finite) RDF graph that is an
approximation of (or "embedded in") an open RDF graph.   This approach
provides a simple, concise set of concepts and terminology with clear
mathematical underpinnings; with a little work it could be used to provide
an RDF semantics that is clear to everybody and shorn of the complexities of
model theory and the hand-wavery of "RDF knowledge".

Sincerely,

Gregg Reynolds
Received on Wednesday, 5 January 2011 14:16:26 UTC