- From: Sandro Hawke <sandro@w3.org>
- Date: Fri, 27 Apr 2012 13:40:20 -0400
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: public-rdf-wg@w3.org
On Fri, 2012-04-27 at 14:15 +0100, Andy Seaborne wrote:
...
> >> This is a strong argument for a two strand approach:
...
> > Agreed, with the caveat that "minimal" may (and probably does) include
> > going a bit beyond what everyone considers "safe" and "tested" as of
> > today.
>
> Could you expand on that...? ... You touched on this a few times
> in different places but I'd find it useful to have a consolidated view
> from you.
Yes. Here's the complete design, below. I'll call it "6.3". I think
partial-graph semantics, which the group seems to prefer, are much more
like quads, so I formulated it in those terms. I think it came out
pretty nicely.
Rather than argue now why each of these elements is necessary, I'll wait
and see if there are any bits you think we should put off to Part 2.
-- Sandro
========
1. An RDF Dataset is a set of Dataset Entries, where each Dataset
Entry is either an RDF Triple or an RDF "Quad". An RDF Quads is
formed by pairing an RDF Triple with another RDF Term, called the
Quad's "graph label" (or just "label"). The label is an RDF blank
node or an RDF IRI-labeled node.
The set of Triples (not in Quads) in the dataset is called the
dataset's "default graph". The set of Triples used in quads with
a particular label in a dataset is called the "named graph"
associated with that label. The set of triples which are in the
default graph or in any named graph is called the "union graph".
Comments: I believe this definition is formally equivalent to the
SPARQL definitions and the one in our draft, except (1) some minor
terminology, (2) allowing blank nodes as graph labels, and (3)
allowing blank nodes to be shared between the graphs. I'm not
attached to this formulation; I just needed some way to convey how
blank nodes can be shared, and after experimenting a bit, quads
seemed like the best way to think about it.
I expect the idea of allowing blank nodes to be used as graph labels
to be controversial, but I think it's important for convenience
and to clarify the semantics in the face of possible dereference
operations. I understand it presents some issues, including
SPARQL compatibility. I propose we consider this AT RISK through
CR and see how those issues pan out.
2. Any dataset can be serialized in TriG, N-Quads, or potentially
other languages. For example, the TriG Document:
{ <a> <b> <c> }
<g1> { <a> <b> <c>, <d> }
_:x { _:x <b> 1 }
is a serialization of the same dataset as the N-Quads document:
<a> <b> <c>.
<a> <b> <c> <g1>.
<a> <b> <d> <g1>.
_:x <b> "1"^^<http://www.w3.org/2001/XMLSchema#integer> _:x
I propose we issue specs for both TriG and N-Quads to help clarify
what is syntax and what is semantics, and because people seem to
like both formats.
3. Datasets have truth values, like RDF Graphs. A dataset may be
said to "hold" or to be "true". Within a system (or potentially
on the open Web) a dataset may be "asserted", and there may be
logical consequences from this. Datasets may entail each other,
much as RDF Graphs may entail each other, and may be logically
consistent or inconsistent, much as RDF Graphs may be logically
consistent or inconsistent.
4. A Dataset is true if and only if (1) its default graph is true
(according to the normal RDF semantics) and (2) all of its quads
are true. A quad is true if and only if (1) its label denotes
something (a "graph resource") which can conceptually "contain"
RDF Triples, and (2) that graph resource conceptually "contains"
at least the quad's Triple.
5. We do not define standard types of graph resources, leaving this
open for research and future standards work. These types can be
defined so they constrain what it means for a triple to be
"contained" by a graph resource of this type. For example, one
could define these classes:
eg:Graph - the class of RDF Graphs. For the dataset semantics,
a Triple is "contained" in exactly those cases where it is in
the graph, as per RDF Semantics. From this definition, it
follows that being "contained" cannot change over time, and two
graph resources which are of type eg:Graph and are known to
contain exactly the same triples are in fact the same graph
resource.
eg:Feed - the class of Web pages which serve only RDF and are
updated to reflect changing circumstances. For the dataset
semantics, a Triple is "contained" if, in some time window, all
successful dereferences of the Feed's URL produce a
serialization of an RDF Graph which contains the triple. A
dataset using graph resources which are instance of eg:Feed
would be time dependent, much like a FOAF file which uses the
foaf:age predicate is time dependent.
======
That's it. (Unless I've forgotten something....)
Received on Friday, 27 April 2012 17:40:39 UTC