[GRAPHS] g-box, g-snap, and g-text

I'm still having trouble following the discussion due to ambiguity of
terms.  But I don't want us to argue about terms at this stage.  So I'd
like to propose some temporary terms.  They are intentionally a little
quirky and not suitable for use in our final specs.  Instead, they are
meant to be short and unambiguous and relatively memorable.  At the end
of this email, I try to connect them to other people's terms.

Here they are:

1.  A "g-box" is a container, like a "set" data structure in
programming.  It holds some RDF arcs, with their nodes. (Alternatively,
it holds some RDF triples.).  G-boxes can overlap, sharing some of the
same nodes and arcs.  Two g-boxes can happen to have the same contents
(right now) while being distinct g-boxes. G-boxes contents can change:
today a particular g-box might contain the triples { my:a my:b _:x.
my:a my:c _:x }, and tomorrow it might instead contain { my:a my:b _:x.
my:a my:c2 _:x }.

2.  A "g-snap" as an idealized snapshot of a g-box; it's a mathematical
set of RDF arcs, with their nodes.  (Alternatively, a mathematical set
of RDF triples.) Like g-boxes, g-snaps can overlap, sharing nodes and
arcs.  Unlike g-boxes, it makes no sense to talk about g-snaps
changing: they are defined to be exactly the collection of their
elements.  If a g-snap were to "change" it would simply be a different
g-snap.  If two g-snaps have the same nodes/arcs, they are really the
same g-snap.  The contents of a g-box at any point in time are a
g-snap. 

3. A "g-text" is a particular sequence of characters or bytes which
conveys a particular g-snap in some language (eg turtle or rdf/xml). If
you can parse a g-text, you know what is in the g-snap it conveys
(except blank nodes, as discussed below).  You can tell someone exactly
what is in a particular g-box at some instant by sending them a
g-text.  (You send them the g-text which conveys the g-snap which is
the current state/contents of that g-box.)

Are those terms and descriptions clear enough?  Are there edge cases
they are missing?  

Now, about URIs:

* A g-box can exist without any name or persistent way of referring to
  it; it can exist as a data structure in a running program, or I
  suppose it can exists in someone's mind.  Long-lived g-boxes
  probably SHOULD be given a preferred single working URL, but there
  might be times when you do don't want to give it any, or when you
  want to give it several URLs.

* You can convey a g-snap with a g-text, but I don't think you usually
  want to name them with URIs.  Sometimes you want to put a g-snap
  into a URI, but that's rare, since in many cases g-snaps are too
  long for most URI-handling software.  For constrained applications,
  though, where overrun is unlikely or okay, you can embed a g-text 
  somewhere in an http URI (eg, as a query parameter), or maybe use 
  "data:" URI.

And blank nodes?   I think it works like this:

* Two g-snaps can contain the same blank node.  A simple example of
  this is to take a g-snap containing at least one blank node, then
  construct another by adding the triple { my:a my:b my:c }.  The
  original g-snap and the one resulting from the union both contain
  the same blank nodes.

* By a similar argument, I believe two g-boxes can also contain the
  same blank node, although not all software will support this.  Given
  a g-box A, I could construct A' to contain whatever A contains and
  also { my:a my:b my:c }.  This happens sometimes in real programs;
  I'd be curious to know which RDF APIs disallow sharing blank nodes
  between their graph-storage instances; my experience is they allow
  it when it's not a problem (eg they are both in memory right now).

* In general, while g-texts do convey g-snaps, they do not identify
  the blank nodes in them.  So, in fact, if you go 

      g-snap A --> g-text --> g-snap A'

  A=A' only if it does not contain blank nodes, because parsing a
  g-snap results in all-new blank nodes.

  We might define new RDF syntaxes which allow for several g-texts to
  be grouped in such a way that blank nodes can be shared between them.
  This is an issue for our work item, "Either [the turtle] syntax or a 
  related syntax should also support multiple graphs and graph stores."

How's that sound?    Make sense?

Okay, relating to other people's terms...

"Tokens", as I read today's email, seem to mostly be g-texts but
sometimes be something that can change over time, and thus be a
container for a g-text, something we might call a "g-text-box".  I
think this later meaning conflates things in a way which will cause
problems, eg for understanding content-negotiation.

"Graphs" in the RDF Semantics are g-snaps.

"Named Graphs", as in SPARQL 1.0, are g-boxes which happen to each
be assigned a URI.

"Graph Literals", as suggested by N3 (and disagreeing with Nathan,
sorry), are a feature of an RDF syntax that allows you to denote a
g-snap by a special kind of term (a "graph literal"). In n3, it looks
like: 

    { _:x my:says { _x: foaf:name "Sandro Hawke" } }.

One can approximate this with every RDF syntax by using a
suitably-defined URI scheme or datatype, such as:

    { _:x my:says "_:x <http://xmlns.com/foaf/0.1/name> \"Sandro Hawke\""^^my:turtleCode }

This isn't as convenient as the N3 approach, and doesn't doesn't allow
blank nodes to be shared (in the second example, the _:x's are not
connected), but it does work in existing RDF syntaxes.

I'd better stop now.

   -- Sandro

Received on Friday, 25 February 2011 03:26:06 UTC