Re: 6.3 -- proposal for (informal) dataset semantics from Pat Hayes on 2012-04-28 (public-rdf-wg@w3.org from April 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Sat, 28 Apr 2012 10:25:04 -0500
To: Sandro Hawke <sandro@w3.org>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-Id: <2BBFCD75-115F-4D50-845F-B7F9962FD296@ihmc.us>
One niggle on terminology: there is no such thing as an "IRI-labelled node". The things in the RDF triples are just IRIs. (Your termoinology suggests a distinction between the 'node'and the IRI used to label it, and this is potentially confusing. Hence my bud-nipping.) 

If we are going to allow blank node labels, why not allow literals as labels too? There are actual use cases for literals (denoting times, for example) whereas I dont know of any for blank nodes. And allowing blank nodes means that reasoners will have to consider inference rules like the blank-introducing rules in the 2004 semantics, which are kind of dumb but required for completeness. 

A more substantial issue. The whole idea of treating the dataset as naming the graphs rather than asserting them, ie treating them as in effect quoted in the dataset, raises the issue of just how, if at all, they can ever get asserted. Do you have any idea in mind for specifying how I can use a name to assert the graph named by the name? If not, what is the point of even having this dataset construction at all? Most of its content is invisible to any process, it would seem. The named graphs in it are effectively content-free. (??)

Pat


On Apr 27, 2012, at 12:40 PM, Sandro Hawke wrote:

> On Fri, 2012-04-27 at 14:15 +0100, Andy Seaborne wrote:
> ...
>>>> This is a strong argument for a two strand approach:
> ...
>>> Agreed, with the caveat that "minimal" may (and probably does) include
>>> going a bit beyond what everyone considers "safe" and "tested" as of
>>> today.
>> 
>> Could you expand on that...?    ...  You touched on this a few times 
>> in different places but I'd find it useful to have a consolidated view 
>> from you.
> 
> Yes.   Here's the complete design, below.  I'll call it "6.3".  I think
> partial-graph semantics, which the group seems to prefer, are much more
> like quads, so I formulated it in those terms.  I think it came out
> pretty nicely.
> 
> Rather than argue now why each of these elements is necessary, I'll wait
> and see if there are any bits you think we should put off to Part 2.
> 
>   -- Sandro
> 
> ========
> 
> 
> 1.  An RDF Dataset is a set of Dataset Entries, where each Dataset
>    Entry is either an RDF Triple or an RDF "Quad".  An RDF Quads is
>    formed by pairing an RDF Triple with another RDF Term, called the
>    Quad's "graph label" (or just "label").  The label is an RDF blank
>    node or an RDF IRI-labeled node.
> 
>    The set of Triples (not in Quads) in the dataset is called the
>    dataset's "default graph".  The set of Triples used in quads with
>    a particular label in a dataset is called the "named graph"
>    associated with that label.  The set of triples which are in the
>    default graph or in any named graph is called the "union graph".
> 
>    Comments: I believe this definition is formally equivalent to the
>    SPARQL definitions and the one in our draft, except (1) some minor
>    terminology, (2) allowing blank nodes as graph labels, and (3)
>    allowing blank nodes to be shared between the graphs.  I'm not
>    attached to this formulation; I just needed some way to convey how
>    blank nodes can be shared, and after experimenting a bit, quads
>    seemed like the best way to think about it.
> 
>    I expect the idea of allowing blank nodes to be used as graph labels
>    to be controversial, but I think it's important for convenience
>    and to clarify the semantics in the face of possible dereference
>    operations.  I understand it presents some issues, including
>    SPARQL compatibility.  I propose we consider this AT RISK through
>    CR and see how those issues pan out.
> 
> 2.  Any dataset can be serialized in TriG, N-Quads, or potentially
>    other languages.  For example, the TriG Document:
> 
>       { <a> <b> <c> }
>       <g1> { <a> <b> <c>, <d> }
>       _:x { _:x <b> 1 }
> 
>    is a serialization of the same dataset as the N-Quads document:
> 
>       <a> <b> <c>.
>       <a> <b> <c> <g1>.
>       <a> <b> <d> <g1>.
>       _:x <b> "1"^^<http://www.w3.org/2001/XMLSchema#integer> _:x
> 
>    I propose we issue specs for both TriG and N-Quads to help clarify
>    what is syntax and what is semantics, and because people seem to
>    like both formats.
> 
> 3.  Datasets have truth values, like RDF Graphs.  A dataset may be
>    said to "hold" or to be "true".  Within a system (or potentially
>    on the open Web) a dataset may be "asserted", and there may be
>    logical consequences from this.  Datasets may entail each other,
>    much as RDF Graphs may entail each other, and may be logically
>    consistent or inconsistent, much as RDF Graphs may be logically
>    consistent or inconsistent.
> 
> 4.  A Dataset is true if and only if (1) its default graph is true
>    (according to the normal RDF semantics) and (2) all of its quads
>    are true.  A quad is true if and only if (1) its label denotes
>    something (a "graph resource") which can conceptually "contain"
>    RDF Triples, and (2) that graph resource conceptually "contains"
>    at least the quad's Triple.
> 
> 5.  We do not define standard types of graph resources, leaving this
>    open for research and future standards work.  These types can be
>    defined so they constrain what it means for a triple to be
>    "contained" by a graph resource of this type.  For example, one
>    could define these classes:
> 
>       eg:Graph - the class of RDF Graphs.  For the dataset semantics,
>       a Triple is "contained" in exactly those cases where it is in
>       the graph, as per RDF Semantics.  From this definition, it
>       follows that being "contained" cannot change over time, and two
>       graph resources which are of type eg:Graph and are known to
>       contain exactly the same triples are in fact the same graph
>       resource.
> 
>       eg:Feed - the class of Web pages which serve only RDF and are
>       updated to reflect changing circumstances.  For the dataset
>       semantics, a Triple is "contained" if, in some time window, all
>       successful dereferences of the Feed's URL produce a
>       serialization of an RDF Graph which contains the triple.  A
>       dataset using graph resources which are instance of eg:Feed
>       would be time dependent, much like a FOAF file which uses the
>       foaf:age predicate is time dependent.
> 
> ======
> 
> That's it.   (Unless I've forgotten something....)
> 
> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Saturday, 28 April 2012 15:25:44 UTC