Named Graphs: statement of the problem from Gregg Reynolds on 2013-09-27 (www-archive@w3.org from September 2013)

From: Gregg Reynolds <dev@mobileink.com>
Date: Fri, 27 Sep 2013 12:23:08 -0500
To: www-archive <www-archive@w3.org>
Message-ID: <CAO40Mi=0tXYASay_J3LjnqhTO86i8_E8RbggqMELirH3fcUZpw@mail.gmail.com>
Hi folks,

Here's one observer's take on the "Named Graph Problem".  Probably won't
help answer Jeremy's request by Oct 2, but it's my attempt to contribute
some clarity.  In my view the language commonly used to discuss the problem
is often confused, so here's a try at clarity and simplicity.  FYI I posted
a list of some relevant references at
http://blog.mobileink.com/2013/09/rdf-named-graphs.html

The problem as I see it boils down to the fact that RDF has no variables.
 Under any particular interpretation, every symbol in the language is a
constant.

The nature of the problem emerges very clearly if you contrast the case of
standard mathematical notation.  Standard notation provides a stock of
variables, so that we can say things like "let s = {1,2,3}".  Note that
this is a kind of meta-expression; it means "locally bind the symbol on the
left-hand side of '=' to the value denoted by the symbol on the right-hand
side."  So it is very different from formally similar expressions like
"4=2+2".  The former makes a fact; the latter states a fact.

Now strip the variables from the standard notation, leaving only constants.
 Let's assume we have only the digits 0..9, the set extension operator { },
the equality symbol '=', and the cardinality function '#'.  Throw in some
punctuation for disambiguation, '()', ',', '.'. Then we can write things
like

    (1).   #2 = 2
    (2).  #{1,2,3} = 3

Now RDFize this notation to support the the sort of thing we want to do in
RDF with named graphs:

    (3).  let 2 = {1,2,3}

Obviously this is disallowed in standard notation, since the binding of the
symbol '2' to the integer two is fixed by definition.  But it isn't
incoherent or irrational to want to allow (3).  One way to handle it would
be to just declare that the local binding replaces the default binding.
 Another way is to do what programming languages do and say that local
bindings "shadow" global bindings.  This effectively adds a second binding
to the symbol and obeys a "most local binding governs" rule of
interpretation.  Some languages provide some kind of syntax that allows us
to refer to a global binding even when a local binding shadows it, perhaps
by supporting some kind of reification of the global namespace so you could
write e.g. 'global.x'.  But this usually only involves variables, not
constants.

Another way to handle this is to observe that since the denotation relation
is a function, we can treat (3) as merely defining an additional functional
binding rather than a replacement or shadowing of the standard binding.
 This gives the symbol '2' two distinct denotations, which makes it
ambiguous.  So we run into problems when we try to say things using '2';
for example, #2 = *2* and #2 = *3* would both be true (taking *n* as
symbols that always only denote n, e.g. *2* always denotes the integer two).

To support use of the symbol '2', we could define disambiguation operators.
 For example, suppose we define c as the default constant denotation
function, and v as the local binding function.  Then we could write
unambiguously  #c(2) = *2* and #v(2) = *3*.  Of course, the meaning of v
would be determined by where it is used; in the absence of any  "let"
expression adding a binding to '2', we would have c(2) = v(2).  And we
could declare that the default interpretation of '2' is c(2), so we could
write #2 = 2 and #v(2) = 3.

In summary, the problem is that we want to use constant symbols in the same
way we customarily use variable symbols.  There are (at least) three ways
of supporting this:  make the constant bindings volatile and allow them to
be changed; support shadowing; or support multiple bindings and provide a
means of picking out the one you want.

As far as I can see this is exactly analogous to the named graph problem.

    (4).  GRAPH :g { :a :b :c }

is exactly like "let 2 = {1,2,3}.  The reason is simple: IRIs are constants
(i.e. have fixed immutable denotations under any interpretation), so such
GRAPH expressions cannot be interpreted as expressing a local binding of
the symbol :g to the object denoted by the graph expression { :a :b :c }.
 If we use :g in a triple, e.g. :g :ownedBy :Acme, then :g refers to some
object in IR and not to the graph denoted by the expression '{ :a :b :c }'.
 Yet we also want to be able to use :g to refer to {:a :b :c}.

At least, that's it looks like to me under the official RDF definition.

Simply binding :g to the graph value won't work, since we want to be able
to continue to refer to its original bound IR value.  Shadowing won't work,
because scope isn't clearly defined.  The upshot is that, since the problem
is the ambiguity introduced by adding a second denotation, ANY solution to
this problem must necessarily involve some function like v above that
serves to pick out the local binding established by GRAPH :g {:a :b :c}.

One possibility that, alas, turns out bad, is to define a property that
corresponds to the v function, so we could write

   GRAPH :g {:a :b :c}
    :g rdf:v :h    // intended meaning:  :h = v(:g) = {:a :b :c}

and then :h would denote {:a :b :c}.  But :h, like :g, is a constant, so
all this does is declare that both :g and :h are locally bound to {:a :b
:c} and thus have two denotations; it does not resolve the ambiguity.
 Similar considerations apply to any solution that depends on a "special"
property or class.  There is simply no way that I can see, given the
current definition of RDF, to pick out one of several denotations in the
use of a symbol.

The only solution I can see at the moment is to define a function symbol,
say rdf:v, and function syntax, say '()' so we could write

    rdf:v(:g) :createdBy :Acme.

The symbol rdf:v would remain a normal IRI with fixed and immutable
denotation, but in combination with '()' and an arg would be defined to
have functional application semantics so rdf:v(:g) means "the value locally
bound to :g".

A separate question is whether naming a graph in this sense can be taken as
equivalent to "attaching" the name to each triple in the graph.  It makes
very little sense mathematically (to me, at least) to treat "let s =
{1,2,3}" as equivalent to "{(1,s), (2,s), (3,s)}".  On the other hand it's
obvious why one might want to do something like this in software, to help
keep track of where stuff came from or what has been said about it.
 Unfortunately this seems to be a case of implementational contingencies
driving language design.

What about the quasi-official definition of Named Graph as a (_name_,
graph) pair, where _name_ denotes the pair of which it is the first element?

It's immediately obvious that this is a recursive definition, which lands
us in an infinite regress. Given (4) above this definition says that :g
denotes (:g, {:a :b :c}), which means

    (:g, {:a :b :c})
    ( (:g, {:a :b :c}) {:a :b :c})
    ( ((:g, {:a :b :c}) {:a :b :c})) {:a :b :c})
    ... ad infinitum ...

It's only slightly less obvious that even if we agree to overlook this, it
does not give us what we want.  If (4) above makes :g refer to (:g, {:a :b
:c}), we are still left with no means of using :g to refer to {:a :b :c}.
 A triple like

    :g :createdBy  :Acme .

would *not* mean that Acme created {:a :b :c}; it would mean that Acme
created the pair (:g, {:a :b :c}), which is a very different idea.  It's
like treating the title of a book as the name of a (title, book) pair, in
which case a statement like "Tolstoy wrote 'War and Peace'" would only mean
that Tolstoy attached the title to the book, or wrote the title on the book
or something like that; it would not mean he wrote the book.

What about g-boxes, surfaces, etc.?

The "Named Graph Problem" is a particular instance of the general problem
of how to deal with multiple denotations in a language, and this problem is
wholly internal to the language (taking 'language' to include both syntax
and formal semantic domain).  It has absolutely nothing to do with the
relation between the language and the world.  In other words, it has
nothing to do with the question of whether an IRI can or should be taken to
refer to a real-world entity such as the Eiffel Tower.  That's an
empirical, practical matter, not something for the syntax and semantics of
the language to decide or even notice.  By the same token, the fact that
real world entities such as triplestores (or whatever you want to call
them) change over time has no relevance to the problem of disambiguating
reference within a language.  The meaning of an expression like "GRAPH :g
{:a :b :c}" is local to the SYNTACTIC scope in which it appears, and bears
no relation to any real-world graph store.

Talk of "RDF spaces", g-boxes, surfaces, etc. routinely conflates the
theoretical and the empirical.  Real world "devices", no matter what you
call them, do not and cannot contain mathematical objects; what they
contain is physical, syntactic, symbolic, structures, which denote
mathematical objects under the rules of the language.  The presence of
(physical) tokens of an RDF language in a device does not make the device
an "RDF space" (surface, g-box, etc.) any more than a book on vector spaces
turns a library into a vector space.  There is only one RDF space, and it
is the one that is internal to the language, so to speak.  So while the
question of what sort of language we can devise to talk about real-world
devices such as triplestores or sparql endpoints and the syntactic objects
they contain, and how such a language relates to the language of RDF is an
important one, it is entirely distinct from the question of how to handle
multiple denotations within a language, i.e. the Named Graph Problem.

Cheers,

Gregg
Received on Friday, 27 September 2013 17:23:36 UTC