Re: [GRAPHS] g-box, g-snap, and g-text from Pierre-Antoine Champin on 2011-02-25 (public-rdf-wg@w3.org from February 2011)

From: Pierre-Antoine Champin <pchampin@liris.cnrs.fr>
Date: Fri, 25 Feb 2011 09:46:16 +0100
To: Sandro Hawke <sandro@w3.org>
CC: public-rdf-wg <public-rdf-wg@w3.org>
Message-ID: <4D676C58.1020100@liris.cnrs.fr>
+1 to this; and I must say I kind of like the quirkiness of those terms. 
I have the feeling I'll miss them if they don't make it to the spec ;)

Some comments below about blank nodes, which indeed raise some issues.

On 02/25/2011 04:25 AM, Sandro Hawke wrote:
> I'm still having trouble following the discussion due to ambiguity of
> terms.  But I don't want us to argue about terms at this stage.  So I'd
> like to propose some temporary terms.  They are intentionally a little
> quirky and not suitable for use in our final specs.  Instead, they are
> meant to be short and unambiguous and relatively memorable.  At the end
> of this email, I try to connect them to other people's terms.
>
> Here they are:
>
> 1.  A "g-box" is a container, like a "set" data structure in
> programming.  It holds some RDF arcs, with their nodes. (Alternatively,
> it holds some RDF triples.).  G-boxes can overlap, sharing some of the
> same nodes and arcs.  Two g-boxes can happen to have the same contents
> (right now) while being distinct g-boxes. G-boxes contents can change:
> today a particular g-box might contain the triples { my:a my:b _:x.
> my:a my:c _:x }, and tomorrow it might instead contain { my:a my:b _:x.
> my:a my:c2 _:x }.
>
> 2.  A "g-snap" as an idealized snapshot of a g-box; it's a mathematical
> set of RDF arcs, with their nodes.  (Alternatively, a mathematical set
> of RDF triples.) Like g-boxes, g-snaps can overlap, sharing nodes and
> arcs.  Unlike g-boxes, it makes no sense to talk about g-snaps
> changing: they are defined to be exactly the collection of their
> elements.  If a g-snap were to "change" it would simply be a different
> g-snap.  If two g-snaps have the same nodes/arcs, they are really the
> same g-snap.  The contents of a g-box at any point in time are a
> g-snap.
>
> 3. A "g-text" is a particular sequence of characters or bytes which
> conveys a particular g-snap in some language (eg turtle or rdf/xml). If
> you can parse a g-text, you know what is in the g-snap it conveys
> (except blank nodes, as discussed below).  You can tell someone exactly
> what is in a particular g-box at some instant by sending them a
> g-text.  (You send them the g-text which conveys the g-snap which is
> the current state/contents of that g-box.)
>
> Are those terms and descriptions clear enough?  Are there edge cases
> they are missing?
>
> Now, about URIs:
>
> * A g-box can exist without any name or persistent way of referring to
>    it; it can exist as a data structure in a running program, or I
>    suppose it can exists in someone's mind.  Long-lived g-boxes
>    probably SHOULD be given a preferred single working URL, but there
>    might be times when you do don't want to give it any, or when you
>    want to give it several URLs.
>
> * You can convey a g-snap with a g-text, but I don't think you usually
>    want to name them with URIs.  Sometimes you want to put a g-snap
>    into a URI, but that's rare, since in many cases g-snaps are too
>    long for most URI-handling software.  For constrained applications,
>    though, where overrun is unlikely or okay, you can embed a g-text
>    somewhere in an http URI (eg, as a query parameter), or maybe use
>    "data:" URI.
>
> And blank nodes?   I think it works like this:
>
> * Two g-snaps can contain the same blank node.  A simple example of
>    this is to take a g-snap containing at least one blank node, then
>    construct another by adding the triple { my:a my:b my:c }.  The
>    original g-snap and the one resulting from the union both contain
>    the same blank nodes.

As g-snap are mathematical sets, I agree they can contain the same blank 
nodes. Your constructive proof makes much sense to me.

> * By a similar argument, I believe two g-boxes can also contain the
>    same blank node, although not all software will support this.  Given
>    a g-box A, I could construct A' to contain whatever A contains and
>    also { my:a my:b my:c }.  This happens sometimes in real programs;
>    I'd be curious to know which RDF APIs disallow sharing blank nodes
>    between their graph-storage instances; my experience is they allow
>    it when it's not a problem (eg they are both in memory right now).

Here, I would be more cautious. From the definitions you gave above, it 
is clear that two g-boxes can contain the two g-snaps described above, 
thus sharing blank nodes.

However I think this should not be allowed by the RDF conceptual model. 
On the contrary, we should force every blank node to appear in at most 
one g-box, defined as the *scope* of the blank node. This is the way I 
read Pat's mail:

>> we simply stipulate, as a part of the underlying RDF conceptual
>> model, that every blank node can occur in at most one RDF graph
>> token.

where I think "graph token" means "g-box" (although I agree with you 
that the proposed definition of "graph token" makes it look like g-text 
sometimes).

> * In general, while g-texts do convey g-snaps, they do not identify
>    the blank nodes in them.  So, in fact, if you go
>
>        g-snap A -->  g-text -->  g-snap A'
>
>    A=A' only if it does not contain blank nodes, because parsing a
>    g-snap results in all-new blank nodes.

... which is a way to enforce the scope-limitation of blank nodes.

>    We might define new RDF syntaxes which allow for several g-texts to
>    be grouped in such a way that blank nodes can be shared between them.
>    This is an issue for our work item, "Either [the turtle] syntax or a
>    related syntax should also support multiple graphs and graph stores."

What would be the use of sharing a blank node across several g-texts if, 
as you stated above, "parsing [a g-text into] a g-snap results in 
all-new blank nodes"? It seems to me that the parsing would therefore 
lose the indentity of the blank node...

I guess I need some concrete use cases to understand your point.

> How's that sound?    Make sense?

much sense indeed. Thanks for that.

Shall we write a wiki page where keep track of those different 
terminologies and how they align? I can do that...

   pa

>
> Okay, relating to other people's terms...
>
> "Tokens", as I read today's email, seem to mostly be g-texts but
> sometimes be something that can change over time, and thus be a
> container for a g-text, something we might call a "g-text-box".  I
> think this later meaning conflates things in a way which will cause
> problems, eg for understanding content-negotiation.
>
> "Graphs" in the RDF Semantics are g-snaps.
>
> "Named Graphs", as in SPARQL 1.0, are g-boxes which happen to each
> be assigned a URI.
>
> "Graph Literals", as suggested by N3 (and disagreeing with Nathan,
> sorry), are a feature of an RDF syntax that allows you to denote a
> g-snap by a special kind of term (a "graph literal"). In n3, it looks
> like:
>
>      { _:x my:says { _x: foaf:name "Sandro Hawke" } }.
>
> One can approximate this with every RDF syntax by using a
> suitably-defined URI scheme or datatype, such as:
>
>      { _:x my:says "_:x<http://xmlns.com/foaf/0.1/name>  \"Sandro Hawke\""^^my:turtleCode }
>
> This isn't as convenient as the N3 approach, and doesn't doesn't allow
> blank nodes to be shared (in the second example, the _:x's are not
> connected), but it does work in existing RDF syntaxes.
>
> I'd better stop now.
>
>     -- Sandro
>
>
>
>
Received on Friday, 25 February 2011 08:46:53 UTC