"layers" (was Re: the term "named graphs") from Sandro Hawke on 2012-04-28 (public-rdf-wg@w3.org from April 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Sat, 28 Apr 2012 09:04:38 -0400
To: Dan Brickley <danbri@danbri.org>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-ID: <1335618278.9663.854.camel@waldron>
On Sat, 2012-04-28 at 12:57 +0200, Dan Brickley wrote:
> On 28 April 2012 11:58, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:
> > On 28/04/12 05:49, Sandro Hawke wrote:
> >> My concern is with how people
> >> use the term in practice, and whether that usage conflicts with the
> >> formal definition.
> >
> > General usage is sloppy, imprecise and changes as convenient. Ambiguity in
> > spoken language is normal.  We all manage.  But we are not all managers.
> 
> Yes, we have same issue with 'property', 'triple', 'statement' and
> others; each might (if we're lucky) have a precise W3C RDF meaning,
> but they shade into other related uses that there can't be such strict
> standardised control over.
> 
> Property is probably the oddest. Sometimes in computing 'color',
> 'size' etc. are themselves called properties, sometimes the size of
> some particular thing is counted as one of its properties, and if it
> had two different colours, they're each a property. So RDF properties
> are close cousin to
> http://en.wikipedia.org/wiki/Property_(programming) but different too;
> I think we gain more by neighbourhood benefits than we suffer from
> sloppyness and confusion there.

Yeah, lots of situations distinguish between attributes and properties
and relations; we mush them together.   If we were starting with a blank
slate, I'd suggest that "aspect" might be the best match for what we
mean.

> "Named graph" by contrast is pretty much our phrase to do with as we
> will (e.g. first page of google results are all "ours") . My guess is
> that usage will get murky if we don't have a sloppier not-so-nitpicky
> phrase also to throw around.

I think "named graph" is already being thrown around quite sloppily.
Trying to answer my own survey, I found even I was very comfortable with
the sloppy usage.

I'd rather come up with some precise new terms, and allow named graph
that sloppy usage (in addition to the precise (u.G) meaning it also has
in the SPARQL spec.

> I've pretty much convinced myself that "layer" is the best metaphor
> there, and that we could productively encourage talk of data 'layers'
> while leaving 'named graph' as the thing that has a much more rigid
> official meaning.

I like that term, "layer".  Excellent....

For me, it works pretty well for what the fourth term in the quad
denotes.  It's similar to "graph container" but suggests much more
strongly that it functions best as part of a whole.  Different
subgraphs are in different layers; you can look at just one layer, or
look at the union of several layers.   It's not entirely intuitive that
nodes and arcs (especially blank nodes) can be in multiple layers, but
it's not too counter-intuitive either, I think.   I picture any node
that occurs in the same location on two layers as being the same node.

So, let's look at the example dataset I used on the survey, without its
default graph for now:
                
                @prefix :    <http://example.org/>
                :g1 { :a :b 10 }
                :g2 { :a :b 20 }
                :g3 { :a :b 10 }

If you make the unique names assumption, then we have three layers.

If you make the closed world assumption (as one has to do during some
database operations) -- that the triples we see here are all the triples
there are in these layers, then we have either two or three layers.  We
can't tell if g1 and g3 name the same layer.   Since they have  the same
set of triples on them, they might be the same layer.  We can tell that
g1 and g3 each do not name the same layer as g2, since clearly their
layers have different triples on them.

If you don't make either the UNA or the CWA, it would be possible that
even g1 and g2 would be names for the same layer.   For example, if we
later learned...

                :g2 { :a :b 10 }
                :g1 { :a :b 20 }
                
then, as far as we knew, the layers would have the same triples on them.

I continue to like that a lot. 

What about Web Architecture?  If the name of a layer is dereferenceable,
is it reasonable to expect/require the returned content to be a
serialization of all the triples on that layer?   I think it probably
is.    So we'd start to think about the different published and
maintained foaf files, doap files, environmental quality surveys (in
RDF), etc, each being a "layer".   The triples on the layer can change
over time, but the layer is still a thing.  

So, "layer" is the new "g-box".   (Maybe "RDF Layer" when we need to be
formal.)  I think it's a much better term.  One cool thing is how much
it raises the question, "layer in what?"    And that's a great question
to be asking; of course, a dataset is a collection of layers,
appropriately stacked, with their names attached, and one of them
flagged as the "default".

Preliminary +100 on "layer".

    -- Sandro
Received on Saturday, 28 April 2012 13:04:49 UTC