[Fwd: [GRAPHS] g-box, g-snap, and g-text]

Following on from a chain of emails on the RDF-WG, Sandro just wrote up 
the following:

At the foot of the email I've added some notes as to how I personally 
see this in relation to what we discuss on this list

-------- Original Message --------
Subject: [GRAPHS] g-box, g-snap, and g-text
Resent-Date: Fri, 25 Feb 2011 03:27:09 +0000
Resent-From: public-rdf-wg@w3.org
Date: Thu, 24 Feb 2011 22:25:56 -0500
From: Sandro Hawke <sandro@w3.org>
Organisation: World Wide Web Consortium (W3C)
To: public-rdf-wg <public-rdf-wg@w3.org>
References: <5AE83D03-F381-446A-BE1B-F0C520B8A887@ihmc.us>  
<4D6630C0.70808@vu.nl> <3E151CCF-6378-403E-BA23-D25C2D610127@ihmc.us>  
<4D66EDB5.5090409@webr3.org>

I'm still having trouble following the discussion due to ambiguity of
terms.  But I don't want us to argue about terms at this stage.  So I'd
like to propose some temporary terms.  They are intentionally a little
quirky and not suitable for use in our final specs.  Instead, they are
meant to be short and unambiguous and relatively memorable.  At the end
of this email, I try to connect them to other people's terms.

Here they are:

1.  A "g-box" is a container, like a "set" data structure in
programming.  It holds some RDF arcs, with their nodes. (Alternatively,
it holds some RDF triples.).  G-boxes can overlap, sharing some of the
same nodes and arcs.  Two g-boxes can happen to have the same contents
(right now) while being distinct g-boxes. G-boxes contents can change:
today a particular g-box might contain the triples { my:a my:b _:x.
my:a my:c _:x }, and tomorrow it might instead contain { my:a my:b _:x.
my:a my:c2 _:x }.

2.  A "g-snap" as an idealized snapshot of a g-box; it's a mathematical
set of RDF arcs, with their nodes.  (Alternatively, a mathematical set
of RDF triples.) Like g-boxes, g-snaps can overlap, sharing nodes and
arcs.  Unlike g-boxes, it makes no sense to talk about g-snaps
changing: they are defined to be exactly the collection of their
elements.  If a g-snap were to "change" it would simply be a different
g-snap.  If two g-snaps have the same nodes/arcs, they are really the
same g-snap.  The contents of a g-box at any point in time are a
g-snap.

3. A "g-text" is a particular sequence of characters or bytes which
conveys a particular g-snap in some language (eg turtle or rdf/xml). If
you can parse a g-text, you know what is in the g-snap it conveys
(except blank nodes, as discussed below).  You can tell someone exactly
what is in a particular g-box at some instant by sending them a
g-text.  (You send them the g-text which conveys the g-snap which is
the current state/contents of that g-box.)

Are those terms and descriptions clear enough?  Are there edge cases
they are missing?

Now, about URIs:

* A g-box can exist without any name or persistent way of referring to
   it; it can exist as a data structure in a running program, or I
   suppose it can exists in someone's mind.  Long-lived g-boxes
   probably SHOULD be given a preferred single working URL, but there
   might be times when you do don't want to give it any, or when you
   want to give it several URLs.

* You can convey a g-snap with a g-text, but I don't think you usually
   want to name them with URIs.  Sometimes you want to put a g-snap
   into a URI, but that's rare, since in many cases g-snaps are too
   long for most URI-handling software.  For constrained applications,
   though, where overrun is unlikely or okay, you can embed a g-text
   somewhere in an http URI (eg, as a query parameter), or maybe use
   "data:" URI.

And blank nodes?   I think it works like this:

* Two g-snaps can contain the same blank node.  A simple example of
   this is to take a g-snap containing at least one blank node, then
   construct another by adding the triple { my:a my:b my:c }.  The
   original g-snap and the one resulting from the union both contain
   the same blank nodes.

* By a similar argument, I believe two g-boxes can also contain the
   same blank node, although not all software will support this.  Given
   a g-box A, I could construct A' to contain whatever A contains and
   also { my:a my:b my:c }.  This happens sometimes in real programs;
   I'd be curious to know which RDF APIs disallow sharing blank nodes
   between their graph-storage instances; my experience is they allow
   it when it's not a problem (eg they are both in memory right now).

* In general, while g-texts do convey g-snaps, they do not identify
   the blank nodes in them.  So, in fact, if you go

       g-snap A --> g-text --> g-snap A'

   A=A' only if it does not contain blank nodes, because parsing a
   g-snap results in all-new blank nodes.

   We might define new RDF syntaxes which allow for several g-texts to
   be grouped in such a way that blank nodes can be shared between them.
   This is an issue for our work item, "Either [the turtle] syntax or a
   related syntax should also support multiple graphs and graph stores."

How's that sound?    Make sense?

Okay, relating to other people's terms...

"Tokens", as I read today's email, seem to mostly be g-texts but
sometimes be something that can change over time, and thus be a
container for a g-text, something we might call a "g-text-box".  I
think this later meaning conflates things in a way which will cause
problems, eg for understanding content-negotiation.

"Graphs" in the RDF Semantics are g-snaps.

"Named Graphs", as in SPARQL 1.0, are g-boxes which happen to each
be assigned a URI.

"Graph Literals", as suggested by N3 (and disagreeing with Nathan,
sorry), are a feature of an RDF syntax that allows you to denote a
g-snap by a special kind of term (a "graph literal"). In n3, it looks
like:

     { _:x my:says { _x: foaf:name "Sandro Hawke" } }.

One can approximate this with every RDF syntax by using a
suitably-defined URI scheme or datatype, such as:

     { _:x my:says "_:x <http://xmlns.com/foaf/0.1/name> \"Sandro 
Hawke\""^^my:turtleCode }

This isn't as convenient as the N3 approach, and doesn't doesn't allow
blank nodes to be shared (in the second example, the _:x's are not
connected), but it does work in existing RDF syntaxes.

I'd better stop now.

================

nathan's notes:

in linked data terms, I very much see an "information resource" as being 
a g-box, identified by an URL (absolute-IRI with an http scheme), which 
when poked (when you GET it) returns back a g-snap (rdf graph) which 
represents the current state of the g-box/ir encoded in a lexical data 
format, a g-text (a representation).

the things described by the statements in a g-box can be given a name 
(fragment), which when concatenated to the g-box name (URL) creates an 
IRI with a fragment which can be used to refer to those things (this was 
what I was getting at with WebNames, the g-box/ir acts as a namespace), 
where I believe the "error" (if we can call it that) lies in web-arch, 
is that the following may be wrong: "the significance of the URI with 
the hash on is a function of the language of document you get when you 
dereference the thing before the hash." [1] and rather that it could be: 
"the hash is a local name scoped within namespace referred to by the 
absolute-IRI before the hash, these local names can be used by 
applications operating within the scope of the namespace or in 
representations retrieved from the namespace where the namespace 
corresponds to a dereferencable resource".

anyhow, to focus more on what we cover and the current state of affairs, 
the toucan problem can be seen as using an absolute-IRI to refer to both 
the g-box and a thing primarily described by statements in the g-box - 
in other cases we can see people using the absolute-IRI to refer to the 
g-box (ir), a g-snap (it's current state) and a g-text (representation 
of it's current state) this could be what one could term a "fixed 
resource", and in the conneg world we can sometimes see a "resource" 
(can't say it's an IR) as being a mapping to several different g-box's 
(for instance when you conneg over language) or as being mapped to a 
single g-snap at a specific point in term, which can be transferred 
using conneg'd g-texts (same value at time t, different representations 
of that value available).

I'd suggest that the ambiguity of IRs in the RDF world could be cleared 
up by saying that absolute-IRIs of the http variety always refer to 
g-boxes and never the things described by the statements within a g-box. 
That g-snap's are never identified by an URI by default (unless it's a 
data uri encoding a graph literal), and likewise that g-texts are always 
unnamed by default.

To scope out from sem-web land, the notion of a "box" ties in nicely 
with REST imho, a "rest:resource" is a "box" which has a set of values 
(states) over time, at a single instant the value of that box is a 
"snap" (snapshot of it's state) and this is encoded in "text" (a 
representation) to be transferred over the wire. The box can be seen as 
a magic box, because it may hold a single thing for as long as the box 
exists (a static image for instance) or it may hold something that does 
things (a web application for instance) - so the absolute-IRI names the 
"box" and the fragments are for use by things in that box (like a web 
application with recomposable states, a document with named sections, a 
video with cues, or a description which refers to concepts).

To quote sandro, "I'd better stop now."! is that making sense though?

[1] http://lists.w3.org/Archives/Public/www-tag/2002Mar/0150.html

Best,

Nathan

Received on Friday, 25 February 2011 06:02:12 UTC