Re: Graphs and Being and Time from Antoine Zimmermann on 2011-02-24 (public-rdf-wg@w3.org from February 2011)

From: Antoine Zimmermann <antoine.zimmermann@insa-lyon.fr>
Date: Thu, 24 Feb 2011 16:06:24 +0100
CC: public-rdf-wg@w3.org
Message-ID: <4D6673F0.8060604@insa-lyon.fr>
Pat, all,


I agree that this distinction has to be made clearer, but isn't a graph 
supposed to be a mathematical object, something which is inherently 
abstract? To me, "RDF abstract graph" sounds redundant.
I would rather suggest keeping the definition of RDF graph (which, by 
the way, is /consistently/ used in [RDF-CONCEPTS] at least) and improve 
on the notion of what you call "token", which I would rather call "RDF 
document". The phrase "RDF document" is actually used in [RDF-CONCEPTS] 
and seems to roughly correspond to this idea of token, although it is 
not formally defined. An informal description of this concept can be 
found in Section 7 (Fragment Identifiers), which I found somewhat puzzling:

"""[...]
     * we assume that the URI part (i.e. excluding fragment identifier) 
identifies a resource, which is presumed to have an RDF representation. 
So when eg:someurl#frag is used in an RDF document, eg:someurl is taken 
to designate some RDF document (even when no such document can be 
retrieved).
     * eg:someurl#frag means the thing that is indicated, according to 
the rules of the application/rdf+xml MIME content-type as a "fragment" 
or "view" of the RDF document at eg:someurl. If the document does not 
exist, or cannot be retrieved, or is available only in formats other 
than application/rdf+xml, then exactly what that view may be is somewhat 
undetermined, but that does not prevent use of RDF to say things about it.
     * the RDF treatment of a fragment identifier allows it to indicate 
a thing that is entirely external to the document, or even to the 
"shared information space" known as the Web. That is, it can be a more 
general idea, like some particular car or a mythical Unicorn.
     * in this way, an application/rdf+xml document acts as an 
intermediary between some Web retrievable documents (itself, at least, 
also any other Web retrievable URIs that it may use, possibly including 
schema URIs and references to other RDF documents), and some set of 
possibly abstract or non-Web entities that the RDF may describe.

This provides a handling of URI references and their denotation that is 
consistent with the RDF model theory and usage, and also with 
conventional Web behavior. Note that nothing here requires that an RDF 
application be able to retrieve any representation of resources 
identified by the URIs in an RDF graph."""

It seems to imply that an RDF document does not necessarily correspond 
to a file and that it may or may not be retrievable but it is supposed 
to designate a certain /representation/ of an RDF graph.

Of course, Section 7 is going to be revised since the new spec should 
refer to IRIs rather than URI references.


Besides, I also think that changing the terminology between versions of 
RDF is dangerous. If the definition of RDF graphs changes, we will have 
to explain that RDF graphs used to be sets of triples in the previous 
specs but now it's different, sets of triples are called RDF abstract 
graph but RDF graphs still exist but they denote something else and 
yaddiyadda... confusing.


Also, to relate the discussion to the idea of "graph literals", I like 
to see RDF graphs (sets of RDF triples) as elements of a value space 
[1], while RDF documents (say, a serialisation of a graph in Turtle or 
RDF/XML) as elements of a lexical space [2]. There is a lexical-to-value 
mapping L2V [3] which maps a serialisation to its corresponding RDF 
graph (apparently, to be properly defined for Turtle). So we have a 
datatype, in the [3] sense, which could be used, for instance, as follows:

  ex:dataset  ex:contain  """:s1 :p1 :o1 .
                             :s2 :p2 :o2 ."""^^ex:graphLiteral .

or

  ex:me  ex:believes  "ex:graphLiteral a ex:CoolThing"^^ex:graphLiteral .


There is even a partial order [4] on ex:graphLiteral such that:

  gl1 > gl2  iff  L2V(gl1) entails L2V(gl2)


It would be nice to relate two graph literals but unfortunately, 
literals are not allowed in subject position :(


I hope this helps a bit.


[1] 2.2 Value Space. In XML Schema Part 2: Datatypes Second Edition. 
http://www.w3.org/TR/xmlschema-2/#value-space
[2] 2.3 Lexical space. In XML Schema Part 2: Datatypes Second Edition. 
http://www.w3.org/TR/xmlschema-2/#lexical-space
[3] 5. Datatypes (Normative). In Resource Description Framework (RDF): 
Concepts and Abstract Syntax. 
http://www.w3.org/TR/rdf-concepts/#section-Datatypes
[4] 4.2.2 ordered. In XML Schema Part 2: Datatypes Second Edition.. 
http://www.w3.org/TR/xmlschema-2/#rf-ordered


Le 23/02/2011 21:13, Pat Hayes a écrit :
> I would like the WG to rectify a basic flaw in the RDF conceptual model, or more exactly, how the basic RDF conceptual model is described in the RDF specs. This will bring the specs more in line with the way that RDF is actually used in practice. It will also not change the RDF semantics, though it will simplify their description a little. It is along the lines I suggested in my ISWC invited talk a while ago. As this will have consequences for the way we talk about RDF graphs generally, I would like to raise it now even though it does not fall into any of the assigned task topics.
>
> Here is the problem: the RDF specs define an RDF graph to be a **set** of triples. But a set is a pure-mathematical Platonic abstraction, like a number or an Abelian group. It is not a data structure or a document or a text, and it cannot be transmitted by HTTP or FTP or any other XXTP. And, worst of all, it has no state, so it can't be 'changed'. It simply does not make sense, given the definitions in the current RDF spec, to speak of a 'temporal graph' or of a graph being 'changed'. If i 'add' an item to a set - say, add C to {A, B} to get {A, B, C} -  I have not "changed" anything: I simply have a new set, different from the previous one: {A, B} =/= {A, B, C}. Sets belong in the world of mathematics, not the world of computing.
>
> What we need is the notion of a 'graph token' (or some other terminology: see below for more on terminology), meaning an actual representation of an RDF graph. This  would be an information resource, a thing with representations that can be copied and sent from place to place using a TP. Put another way, this would have the same kind of relationship to an RDF graph that a particular copy of Moby Dick has to the literary work with the same name, or that a particular token of the letter 'A' has to the first letter of the English alphabet; and just as with these cases, there can be many tokens of the same RDF graph. I might have my copy and you might have yours: same graph, different tokens. And we can make our own rules for token identity, so it can make perfect sense for tokens to have a state, and a single token to be a token of different RDF graphs as changes are made to it, which is what we are actually currently talking about when we use the impossible terminology of "c
hanges to a graph". To emphasize: we already have these things. Every RDF/XML document on the Web, every piece of RDFa, is actually a graph token rather than an actual RDF graph (where I am using "RDF graph" here strictly according to the RDF specs, of course.) You cannot put an actual RDF graph into a digital memory, any more than you can put the number three into one. You have to use a numeral to represent a number, and you have to use a graph document or a graph data structure or some such token-like thing to represent the RDF graph. The issue is only that the RDF specs currently don't acknowledge this simple fact: they represent a kind of idealized fiction that refuses to acknowledge the distinction between a work and a book, or between a number and a numeral, or between a graph and an encoding of it. If one reads the various specs which mention RDF, they vary in their use of the term "RDF graph". Some of them use it mean a graph token, others to mean the Platonic abstrac
tion; and still others seem to be kind of muddled. We saw some of this muddle in the IRC log of today's telecon, in fact.
>
> In the ISWC talk I invented a completely new terminology of a "surface" (think of a piece of paper on which the graph is conceptually 'drawn'), which I rather like, but we don't have to go that far. In fact, I would propose that we keep the terminology "RDF graph" to refer to the tokens (which is already now a common usage) but alter the specs so that the current RDF graph - the set of triples - is called something like an "RDF abstract graph". Then an RDF graph is not a set, but rather something like an RDF "resource" (In the REST sense), ie an entity which emits a representation of an RDF abstract graph when poked. This allows an RDF graph to have a state that can change, and it brings the whole business of naming a graph with a URI into line with all other kinds of Web naming and identification. This is in fact the way the world actually is, of course: the change is simply bringing the terminology into line with actual practice.
>
> The nice thing is, if we do modify the specs (actually, the RDF conceptual model) to be more realistic in this way, by making an explicit distinction between the abstract graph and a particular graph token (think of a document), then several things get simpler and some "issues" go away.
>
> 1. An RDF graph (new sense, ie a graph token) is now something that can have an identity over time (corresponding to continuing to be identified by a cool IRI, in the usual Web-sanctioned way), so this whole way of talking now makes sense.
>
> 2.  It is quite sensible to have two RDF graphs (tokens) with different names which are the same RDF (abstract) graph. That is, two graph tokens which look like (i.e., when poked emit representations of) the same RDF abstract graph. This has always been an issue for the idea of 'named graphs': how can a name be attached to a particular RDF **abstract** graph (as opposed to some document or representation of that abstract graph)? And OK, the answer is: it can't, and this does not matter, because all we are ever needing to identify are graph tokens, not abstract graphs. You name a graph by identifying a token of it. But that only gives you power over the token, not over the abstraction itself.
>
> 3. There is now a very nice way to handle blank nodes: we simply stipulate, as a part of the underlying RDF conceptual model, that every blank node can occur in at most one RDF graph token. Blank nodes are unique to tokens.  Intuitively, we think of the blank nodes in any graph token as belonging to the token itself rather than to the abstract graph. This at one stroke fixes the 'scope' of blank node identifiers in any RDF surface syntax or notation (whatever is the boundary of the RDF graph token according to the rules of that syntax, that is where the existential quantifiers are that bind the bnode identifiers) and it also eliminates the need to define 'graph merging' as opposed to 'graph unioning' in the specifications. (If you don't follow this, just believe me. It makes the specs a lot easier and quite a bit shorter.)
>
> Anyway, I offer this as an item for the WG to consider. I don't have any particular brief for the choice of terminology, but I do think it is important for us to agree on the basic conceptual distinction (between the 'abstract' idea of an RDF graph as a mathematical set, and some more concrete notion of a graph as a data object with a state that can be identified by a URL) and agree to use it ourselves. If we do, then this will impact at least the language that we use in the Graphs TF, and perhaps the way that we actually think about the issues.
>
> I hereby volunteer to write drafts, as necessary, of the relevant changes to the RDF Concepts and RDF Semantics documents to accommodate the necessary changes, if we decide to make them. I think that, with care, no changes would be needed to the SPARQL draft documents.
>
> Pat
>
>
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

-- 
Antoine Zimmermann
Researcher at:
Laboratoire d'InfoRmatique en Image et Systèmes d'information
Database Group
7 Avenue Jean Capelle
69621 Villeurbanne Cedex
France
Lecturer at:
Institut National des Sciences Appliquées de Lyon
20 Avenue Albert Einstein
69621 Villeurbanne Cedex
France
antoine.zimmermann@insa-lyon.fr
http://zimmer.aprilfoolsreview.com/
Received on Thursday, 24 February 2011 15:07:00 UTC