- From: Gregg Reynolds <dev@mobileink.com>
- Date: Fri, 27 Sep 2013 12:23:08 -0500
- To: www-archive <www-archive@w3.org>
- Message-ID: <CAO40Mi=0tXYASay_J3LjnqhTO86i8_E8RbggqMELirH3fcUZpw@mail.gmail.com>
Hi folks, Here's one observer's take on the "Named Graph Problem". Probably won't help answer Jeremy's request by Oct 2, but it's my attempt to contribute some clarity. In my view the language commonly used to discuss the problem is often confused, so here's a try at clarity and simplicity. FYI I posted a list of some relevant references at http://blog.mobileink.com/2013/09/rdf-named-graphs.html The problem as I see it boils down to the fact that RDF has no variables. Under any particular interpretation, every symbol in the language is a constant. The nature of the problem emerges very clearly if you contrast the case of standard mathematical notation. Standard notation provides a stock of variables, so that we can say things like "let s = {1,2,3}". Note that this is a kind of meta-expression; it means "locally bind the symbol on the left-hand side of '=' to the value denoted by the symbol on the right-hand side." So it is very different from formally similar expressions like "4=2+2". The former makes a fact; the latter states a fact. Now strip the variables from the standard notation, leaving only constants. Let's assume we have only the digits 0..9, the set extension operator { }, the equality symbol '=', and the cardinality function '#'. Throw in some punctuation for disambiguation, '()', ',', '.'. Then we can write things like (1). #2 = 2 (2). #{1,2,3} = 3 Now RDFize this notation to support the the sort of thing we want to do in RDF with named graphs: (3). let 2 = {1,2,3} Obviously this is disallowed in standard notation, since the binding of the symbol '2' to the integer two is fixed by definition. But it isn't incoherent or irrational to want to allow (3). One way to handle it would be to just declare that the local binding replaces the default binding. Another way is to do what programming languages do and say that local bindings "shadow" global bindings. This effectively adds a second binding to the symbol and obeys a "most local binding governs" rule of interpretation. Some languages provide some kind of syntax that allows us to refer to a global binding even when a local binding shadows it, perhaps by supporting some kind of reification of the global namespace so you could write e.g. 'global.x'. But this usually only involves variables, not constants. Another way to handle this is to observe that since the denotation relation is a function, we can treat (3) as merely defining an additional functional binding rather than a replacement or shadowing of the standard binding. This gives the symbol '2' two distinct denotations, which makes it ambiguous. So we run into problems when we try to say things using '2'; for example, #2 = *2* and #2 = *3* would both be true (taking *n* as symbols that always only denote n, e.g. *2* always denotes the integer two). To support use of the symbol '2', we could define disambiguation operators. For example, suppose we define c as the default constant denotation function, and v as the local binding function. Then we could write unambiguously #c(2) = *2* and #v(2) = *3*. Of course, the meaning of v would be determined by where it is used; in the absence of any "let" expression adding a binding to '2', we would have c(2) = v(2). And we could declare that the default interpretation of '2' is c(2), so we could write #2 = 2 and #v(2) = 3. In summary, the problem is that we want to use constant symbols in the same way we customarily use variable symbols. There are (at least) three ways of supporting this: make the constant bindings volatile and allow them to be changed; support shadowing; or support multiple bindings and provide a means of picking out the one you want. As far as I can see this is exactly analogous to the named graph problem. (4). GRAPH :g { :a :b :c } is exactly like "let 2 = {1,2,3}. The reason is simple: IRIs are constants (i.e. have fixed immutable denotations under any interpretation), so such GRAPH expressions cannot be interpreted as expressing a local binding of the symbol :g to the object denoted by the graph expression { :a :b :c }. If we use :g in a triple, e.g. :g :ownedBy :Acme, then :g refers to some object in IR and not to the graph denoted by the expression '{ :a :b :c }'. Yet we also want to be able to use :g to refer to {:a :b :c}. At least, that's it looks like to me under the official RDF definition. Simply binding :g to the graph value won't work, since we want to be able to continue to refer to its original bound IR value. Shadowing won't work, because scope isn't clearly defined. The upshot is that, since the problem is the ambiguity introduced by adding a second denotation, ANY solution to this problem must necessarily involve some function like v above that serves to pick out the local binding established by GRAPH :g {:a :b :c}. One possibility that, alas, turns out bad, is to define a property that corresponds to the v function, so we could write GRAPH :g {:a :b :c} :g rdf:v :h // intended meaning: :h = v(:g) = {:a :b :c} and then :h would denote {:a :b :c}. But :h, like :g, is a constant, so all this does is declare that both :g and :h are locally bound to {:a :b :c} and thus have two denotations; it does not resolve the ambiguity. Similar considerations apply to any solution that depends on a "special" property or class. There is simply no way that I can see, given the current definition of RDF, to pick out one of several denotations in the use of a symbol. The only solution I can see at the moment is to define a function symbol, say rdf:v, and function syntax, say '()' so we could write rdf:v(:g) :createdBy :Acme. The symbol rdf:v would remain a normal IRI with fixed and immutable denotation, but in combination with '()' and an arg would be defined to have functional application semantics so rdf:v(:g) means "the value locally bound to :g". A separate question is whether naming a graph in this sense can be taken as equivalent to "attaching" the name to each triple in the graph. It makes very little sense mathematically (to me, at least) to treat "let s = {1,2,3}" as equivalent to "{(1,s), (2,s), (3,s)}". On the other hand it's obvious why one might want to do something like this in software, to help keep track of where stuff came from or what has been said about it. Unfortunately this seems to be a case of implementational contingencies driving language design. What about the quasi-official definition of Named Graph as a (_name_, graph) pair, where _name_ denotes the pair of which it is the first element? It's immediately obvious that this is a recursive definition, which lands us in an infinite regress. Given (4) above this definition says that :g denotes (:g, {:a :b :c}), which means (:g, {:a :b :c}) ( (:g, {:a :b :c}) {:a :b :c}) ( ((:g, {:a :b :c}) {:a :b :c})) {:a :b :c}) ... ad infinitum ... It's only slightly less obvious that even if we agree to overlook this, it does not give us what we want. If (4) above makes :g refer to (:g, {:a :b :c}), we are still left with no means of using :g to refer to {:a :b :c}. A triple like :g :createdBy :Acme . would *not* mean that Acme created {:a :b :c}; it would mean that Acme created the pair (:g, {:a :b :c}), which is a very different idea. It's like treating the title of a book as the name of a (title, book) pair, in which case a statement like "Tolstoy wrote 'War and Peace'" would only mean that Tolstoy attached the title to the book, or wrote the title on the book or something like that; it would not mean he wrote the book. What about g-boxes, surfaces, etc.? The "Named Graph Problem" is a particular instance of the general problem of how to deal with multiple denotations in a language, and this problem is wholly internal to the language (taking 'language' to include both syntax and formal semantic domain). It has absolutely nothing to do with the relation between the language and the world. In other words, it has nothing to do with the question of whether an IRI can or should be taken to refer to a real-world entity such as the Eiffel Tower. That's an empirical, practical matter, not something for the syntax and semantics of the language to decide or even notice. By the same token, the fact that real world entities such as triplestores (or whatever you want to call them) change over time has no relevance to the problem of disambiguating reference within a language. The meaning of an expression like "GRAPH :g {:a :b :c}" is local to the SYNTACTIC scope in which it appears, and bears no relation to any real-world graph store. Talk of "RDF spaces", g-boxes, surfaces, etc. routinely conflates the theoretical and the empirical. Real world "devices", no matter what you call them, do not and cannot contain mathematical objects; what they contain is physical, syntactic, symbolic, structures, which denote mathematical objects under the rules of the language. The presence of (physical) tokens of an RDF language in a device does not make the device an "RDF space" (surface, g-box, etc.) any more than a book on vector spaces turns a library into a vector space. There is only one RDF space, and it is the one that is internal to the language, so to speak. So while the question of what sort of language we can devise to talk about real-world devices such as triplestores or sparql endpoints and the syntactic objects they contain, and how such a language relates to the language of RDF is an important one, it is entirely distinct from the question of how to handle multiple denotations within a language, i.e. the Named Graph Problem. Cheers, Gregg
Received on Friday, 27 September 2013 17:23:36 UTC