Re: Graphs and Being and Time from Alex Hall on 2011-02-24 (public-rdf-wg@w3.org from February 2011)

From: Alex Hall <alexhall@revelytix.com>
Date: Thu, 24 Feb 2011 11:14:15 -0500
To: public-rdf-wg@w3.org
Message-ID: <AANLkTim0gM_dDa+9cWzQZQbQjynF_68OxmzE+ANACc-C@mail.gmail.com>
On Thu, Feb 24, 2011 at 5:19 AM, Guus Schreiber <guus.schreiber@vu.nl>wrote:

> Pat,
>
> I 'm trying to translate this into a simple story for outsiders:
>
>  RDF abstract graph: triples
>  RDF graph: quads
>
> with a rationale along the following lines: the revised RDF spec does not
> *replace* triples with quads; it just *adds* the quad notion, to handle the
> established practice of how abstract graphs materialize on the Web.
>
> I hope this makes some sense...
>
> Guus


I would avoid making mention of quads in the spec -- this strikes me as more
of an implementation detail.  But, if the notion of quads helps outsiders
make sense of the concept then I have no issues with its inclusion in the
primer.

For that matter, I agree with others who have said that we should avoid use
of the term "document" in describing named graphs.  Setting aside whatever
formal definition it may have, it carries implications that don't
necessarily align with what I think we're trying to express.  In my mind,
"document" carries with it the implication of serialized bytes on a stream.
Also, when we talk about adding support in Turtle for named graphs, that
opens the door for a Turtle "document" to contain multiple RDF "documents"
which just sounds way too confusing.

I agree with the general gist of this discussion -- an RDF graph is an
abstract mathematical concept that captures some facts about the world.
When we "name" a graph, we are (a) asserting the existence of a graph and
(b) identifying the graph so that we can make further assertions about it.
When we associate some materialized RDF content with that named graph in a
graph store, or answer queries about that graph in a query service, we are
making assertions as to the actual content of the graph.  This being the
open world, we can never fully know all the actual contents of the graph,
only what somebody has claimed to be true about that graph.  Based on these
claims, an application can infer for itself what it believes to be true
about the content of the graph, just as it can infer for itself the
properties of a resource based on the RDF statements asserted about that
resource.

So, while I agree with the general thrust of this conversation, I'm more
comfortable in treating a graph as just another web resource, and describing
named graphs/quads/queries in terms of assertions about that resource (as
opposed to introducing the notion of a "graph token").  That leaves open the
mechanism by which content is asserted for a graph -- the contents could be
explicitly enumerated in a graph store or in an RDF document (which would
correspond to what my understanding of a "graph literal" is).  Or they could
be computed based on some other application data (think RDB-to-RDF mapping),
physical properties of the world (a graph describing geographic locations),
or abstract mathematical concepts (a graph describing the natural numbers).

-Alex



>
>
>
> On 23/02/2011 21:13, Pat Hayes wrote:
>
>> I would like the WG to rectify a basic flaw in the RDF conceptual model,
>> or more exactly, how the basic RDF conceptual model is described in the RDF
>> specs. This will bring the specs more in line with the way that RDF is
>> actually used in practice. It will also not change the RDF semantics, though
>> it will simplify their description a little. It is along the lines I
>> suggested in my ISWC invited talk a while ago. As this will have
>> consequences for the way we talk about RDF graphs generally, I would like to
>> raise it now even though it does not fall into any of the assigned task
>> topics.
>>
>> Here is the problem: the RDF specs define an RDF graph to be a **set** of
>> triples. But a set is a pure-mathematical Platonic abstraction, like a
>> number or an Abelian group. It is not a data structure or a document or a
>> text, and it cannot be transmitted by HTTP or FTP or any other XXTP. And,
>> worst of all, it has no state, so it can't be 'changed'. It simply does not
>> make sense, given the definitions in the current RDF spec, to speak of a
>> 'temporal graph' or of a graph being 'changed'. If i 'add' an item to a set
>> - say, add C to {A, B} to get {A, B, C} -  I have not "changed" anything: I
>> simply have a new set, different from the previous one: {A, B} =/= {A, B,
>> C}. Sets belong in the world of mathematics, not the world of computing.
>>
>> What we need is the notion of a 'graph token' (or some other terminology:
>> see below for more on terminology), meaning an actual representation of an
>> RDF graph. This  would be an information resource, a thing with
>> representations that can be copied and sent from place to place using a TP.
>> Put another way, this would have the same kind of relationship to an RDF
>> graph that a particular copy of Moby Dick has to the literary work with the
>> same name, or that a particular token of the letter 'A' has to the first
>> letter of the English alphabet; and just as with these cases, there can be
>> many tokens of the same RDF graph. I might have my copy and you might have
>> yours: same graph, different tokens. And we can make our own rules for token
>> identity, so it can make perfect sense for tokens to have a state, and a
>> single token to be a token of different RDF graphs as changes are made to
>> it, which is what we are actually currently talking about when we use the
>> impossible terminology of "ch
>>
> anges to a graph". To emphasize: we already have these things. Every
> RDF/XML document on the Web, every piece of RDFa, is actually a graph token
> rather than an actual RDF graph (where I am using "RDF graph" here strictly
> according to the RDF specs, of course.) You cannot put an actual RDF graph
> into a digital memory, any more than you can put the number three into one.
> You have to use a numeral to represent a number, and you have to use a graph
> document or a graph data structure or some such token-like thing to
> represent the RDF graph. The issue is only that the RDF specs currently
> don't acknowledge this simple fact: they represent a kind of idealized
> fiction that refuses to acknowledge the distinction between a work and a
> book, or between a number and a numeral, or between a graph and an encoding
> of it. If one reads the various specs which mention RDF, they vary in their
> use of the term "RDF graph". Some of them use it mean a graph token, others
> to mean the Platonic abstract
> ion; and still others seem to be kind of muddled. We saw some of this
> muddle in the IRC log of today's telecon, in fact.
>
>>
>> In the ISWC talk I invented a completely new terminology of a "surface"
>> (think of a piece of paper on which the graph is conceptually 'drawn'),
>> which I rather like, but we don't have to go that far. In fact, I would
>> propose that we keep the terminology "RDF graph" to refer to the tokens
>> (which is already now a common usage) but alter the specs so that the
>> current RDF graph - the set of triples - is called something like an "RDF
>> abstract graph". Then an RDF graph is not a set, but rather something like
>> an RDF "resource" (In the REST sense), ie an entity which emits a
>> representation of an RDF abstract graph when poked. This allows an RDF graph
>> to have a state that can change, and it brings the whole business of naming
>> a graph with a URI into line with all other kinds of Web naming and
>> identification. This is in fact the way the world actually is, of course:
>> the change is simply bringing the terminology into line with actual
>> practice.
>>
>> The nice thing is, if we do modify the specs (actually, the RDF conceptual
>> model) to be more realistic in this way, by making an explicit distinction
>> between the abstract graph and a particular graph token (think of a
>> document), then several things get simpler and some "issues" go away.
>>
>> 1. An RDF graph (new sense, ie a graph token) is now something that can
>> have an identity over time (corresponding to continuing to be identified by
>> a cool IRI, in the usual Web-sanctioned way), so this whole way of talking
>> now makes sense.
>>
>> 2.  It is quite sensible to have two RDF graphs (tokens) with different
>> names which are the same RDF (abstract) graph. That is, two graph tokens
>> which look like (i.e., when poked emit representations of) the same RDF
>> abstract graph. This has always been an issue for the idea of 'named
>> graphs': how can a name be attached to a particular RDF **abstract** graph
>> (as opposed to some document or representation of that abstract graph)? And
>> OK, the answer is: it can't, and this does not matter, because all we are
>> ever needing to identify are graph tokens, not abstract graphs. You name a
>> graph by identifying a token of it. But that only gives you power over the
>> token, not over the abstraction itself.
>>
>> 3. There is now a very nice way to handle blank nodes: we simply
>> stipulate, as a part of the underlying RDF conceptual model, that every
>> blank node can occur in at most one RDF graph token. Blank nodes are unique
>> to tokens.  Intuitively, we think of the blank nodes in any graph token as
>> belonging to the token itself rather than to the abstract graph. This at one
>> stroke fixes the 'scope' of blank node identifiers in any RDF surface syntax
>> or notation (whatever is the boundary of the RDF graph token according to
>> the rules of that syntax, that is where the existential quantifiers are that
>> bind the bnode identifiers) and it also eliminates the need to define 'graph
>> merging' as opposed to 'graph unioning' in the specifications. (If you don't
>> follow this, just believe me. It makes the specs a lot easier and quite a
>> bit shorter.)
>>
>> Anyway, I offer this as an item for the WG to consider. I don't have any
>> particular brief for the choice of terminology, but I do think it is
>> important for us to agree on the basic conceptual distinction (between the
>> 'abstract' idea of an RDF graph as a mathematical set, and some more
>> concrete notion of a graph as a data object with a state that can be
>> identified by a URL) and agree to use it ourselves. If we do, then this will
>> impact at least the language that we use in the Graphs TF, and perhaps the
>> way that we actually think about the issues.
>>
>> I hereby volunteer to write drafts, as necessary, of the relevant changes
>> to the RDF Concepts and RDF Semantics documents to accommodate the necessary
>> changes, if we decide to make them. I think that, with care, no changes
>> would be needed to the SPARQL draft documents.
>>
>> Pat
>>
>>
>>
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>
>>
>>
>>
>>
>>
>>
> --
> Prof. Guus Schreiber
> Web & Media, Computer Science
> VU University Amsterdam
> http://www.cs.vu.nl/~guus
>
>
Received on Thursday, 24 February 2011 16:14:50 UTC