6.3 -- proposal for (informal) dataset semantics

On Fri, 2012-04-27 at 14:15 +0100, Andy Seaborne wrote:
...
> >> This is a strong argument for a two strand approach:
...
> > Agreed, with the caveat that "minimal" may (and probably does) include
> > going a bit beyond what everyone considers "safe" and "tested" as of
> > today.
> 
> Could you expand on that...?    ...  You touched on this a few times 
> in different places but I'd find it useful to have a consolidated view 
> from you.

Yes.   Here's the complete design, below.  I'll call it "6.3".  I think
partial-graph semantics, which the group seems to prefer, are much more
like quads, so I formulated it in those terms.  I think it came out
pretty nicely.

Rather than argue now why each of these elements is necessary, I'll wait
and see if there are any bits you think we should put off to Part 2.

   -- Sandro

========


1.  An RDF Dataset is a set of Dataset Entries, where each Dataset
    Entry is either an RDF Triple or an RDF "Quad".  An RDF Quads is
    formed by pairing an RDF Triple with another RDF Term, called the
    Quad's "graph label" (or just "label").  The label is an RDF blank
    node or an RDF IRI-labeled node.

    The set of Triples (not in Quads) in the dataset is called the
    dataset's "default graph".  The set of Triples used in quads with
    a particular label in a dataset is called the "named graph"
    associated with that label.  The set of triples which are in the
    default graph or in any named graph is called the "union graph".

    Comments: I believe this definition is formally equivalent to the
    SPARQL definitions and the one in our draft, except (1) some minor
    terminology, (2) allowing blank nodes as graph labels, and (3)
    allowing blank nodes to be shared between the graphs.  I'm not
    attached to this formulation; I just needed some way to convey how
    blank nodes can be shared, and after experimenting a bit, quads
    seemed like the best way to think about it.
    
    I expect the idea of allowing blank nodes to be used as graph labels
    to be controversial, but I think it's important for convenience
    and to clarify the semantics in the face of possible dereference
    operations.  I understand it presents some issues, including
    SPARQL compatibility.  I propose we consider this AT RISK through
    CR and see how those issues pan out.

2.  Any dataset can be serialized in TriG, N-Quads, or potentially
    other languages.  For example, the TriG Document:

       { <a> <b> <c> }
       <g1> { <a> <b> <c>, <d> }
       _:x { _:x <b> 1 }

    is a serialization of the same dataset as the N-Quads document:

       <a> <b> <c>.
       <a> <b> <c> <g1>.
       <a> <b> <d> <g1>.
       _:x <b> "1"^^<http://www.w3.org/2001/XMLSchema#integer> _:x

    I propose we issue specs for both TriG and N-Quads to help clarify
    what is syntax and what is semantics, and because people seem to
    like both formats.

3.  Datasets have truth values, like RDF Graphs.  A dataset may be
    said to "hold" or to be "true".  Within a system (or potentially
    on the open Web) a dataset may be "asserted", and there may be
    logical consequences from this.  Datasets may entail each other,
    much as RDF Graphs may entail each other, and may be logically
    consistent or inconsistent, much as RDF Graphs may be logically
    consistent or inconsistent.

4.  A Dataset is true if and only if (1) its default graph is true
    (according to the normal RDF semantics) and (2) all of its quads
    are true.  A quad is true if and only if (1) its label denotes
    something (a "graph resource") which can conceptually "contain"
    RDF Triples, and (2) that graph resource conceptually "contains"
    at least the quad's Triple.

5.  We do not define standard types of graph resources, leaving this
    open for research and future standards work.  These types can be
    defined so they constrain what it means for a triple to be
    "contained" by a graph resource of this type.  For example, one
    could define these classes:

       eg:Graph - the class of RDF Graphs.  For the dataset semantics,
       a Triple is "contained" in exactly those cases where it is in
       the graph, as per RDF Semantics.  From this definition, it
       follows that being "contained" cannot change over time, and two
       graph resources which are of type eg:Graph and are known to
       contain exactly the same triples are in fact the same graph
       resource.

       eg:Feed - the class of Web pages which serve only RDF and are
       updated to reflect changing circumstances.  For the dataset
       semantics, a Triple is "contained" if, in some time window, all
       successful dereferences of the Feed's URL produce a
       serialization of an RDF Graph which contains the triple.  A
       dataset using graph resources which are instance of eg:Feed
       would be time dependent, much like a FOAF file which uses the
       foaf:age predicate is time dependent.

======

That's it.   (Unless I've forgotten something....)

 

Received on Friday, 27 April 2012 17:40:39 UTC