Re: complete vs partial graph semantics from Sandro Hawke on 2012-04-12 (public-rdf-wg@w3.org from April 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Thu, 12 Apr 2012 13:47:32 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-ID: <1334252852.2249.314.camel@waldron>
(...Fingers crossed that maybe at this point some light of mutual
understanding will begin to emerge.  I hope, I hope....)


On Thu, 2012-04-12 at 16:54 +0100, Andy Seaborne wrote:
> 
> On 12/04/12 16:09, Sandro Hawke wrote:
> > I'm having a lot of trouble understanding the motivation for
> > partial-graph semantics.   It seems to me like a kind-of-cool but
> > way-too-complicated idea.
> 
> Dataset merge - it's not just imperative.  It's just the conclusions of 
> accepting two datasets like graph merge is accepting the conclucions of 
> two graphs.

Right.   But when we're using RDF in practice, we need some properties
to be defined.  In many cases, some of those properties will be
functional (maxCard 1).  We might or not bother to use OWL to declare
that, but still, in our mental and software modeling, it makes no sense
to have two distinct values for that property.   In that kind of
situation, merging graphs can easily give us a contradiction.

That's what I'm proposing with datasets.  I'm saying that each label
should correspond to only one RDF Graph (at some point in time, but
we're handling that separately).    So when you take two datasets
together, they could easily give you a contradiction.  They'll do that
if the graphs associated with some label are different.

(but read on...)

> Concat TriG files.

I think it'd be nice to have merge and concat mean the same thing, but
that depends what it means to have the same label used twice in the same
trig file.

> > It's like you're saying that any triple:
> >
> >      <a>  <b>  7.
> >
> > is to be understood as saying that the value of the 'b' property of 'a'
> > is not just seven, but rather it is seven-or-more.
> >
> > You see the analogy?
> 
> <#you> :name "Sandro" .
> <#you> :name "Sandro Hawkes" .
> 
> more triples come along.

If instead of :name we had :birthDate or :preferredName or :birthMother
we might reasonably have a max-cardinality of 1.  Then, when more
triples like that came along, we could easily have a contradiction.

The relationship between the label and the graph has to also be
max-cardinality 1, otherwise it can't really work as a label.   At
least, not in the sense we use those labels in SPARQL, where there is at
most one graph selected by using some particular term after the GRAPH
keyword.

I *think* the real difference in our semantic models here that when I
see

  <u> { <a> <b> 1 }

I understand that to be saying something about "u" and the RDF graph
serialized as "<a> <b> 1".   I don't know exactly what the relationship
is, but I see the label and the g-text.  I parse the g-text to a g-snap.
Then I mentally record that this label points to that g-snap.

So if I later see

  <u> { <a> <b> 2 }

I think... what?!!   I thought <u> was related to the graph serialized
as "<a> <b> 1".   How can it also be related to this graph with a 2 in
it?  How can these both be true?   Which of those two graphs is <u>
associated with...?    They're different graphs, and it only point to
one thing!

You're telling me, I think, that if I see both those statements, I
should realize "Ohhhh!   ACTUALLY it turns out <u> is associated with
the graph { <a> <b> 1,2 }".   

As a human, that makes perfect sense.  But it is non-monotonic
reasoning, which I'm pretty sure we want to avoid.

So, how do you reconcile it?

> Another issue is that I don't think you can ever claim completeness 
> without minting the <u>.
> 
> Consider the non-archiving crawler.  Just because it say N triples, 
> don't mean there are N triples at a resource.  Security, cookies, 
> location etc may give someone else a different view even of an immutable 
> resource and the crawler can't know.

That argument applies to a lot more than just completeness, so it it's
not an argument against completeness.

If you can't trust the server to give you the complete g-box contents as
part of a 200-OK response, then I'd think you also couldn't trust it to
be giving you the *partial* g-box contents.   It might just as well be
giving you customizes triples.  In that case, it's not acting like a
g-box server.  

In that situation, yes, you have to fall back to just reporting your own
experience.  I tried to dereference this URL "u", and this is what I
got.  That's like patterns 1 or 3 on [1].    With these, you're not
claiming you know what triples are named by "u".

> >   To me,
> >
> >     <u>   {<a>  <b>  <c>  }
> >
> > looks like tagging the graph {<a>  <b>  <c>  } with the label<u>; but now
> > I'm being told you're actually tagging some unknown graph that happens
> > to contain at least the triple {<a>  <b>  <c>  }.
> 
> That isn't what I meant by labelling.  I have so far not been able to 
> work out what you mean unless it's owl:sameAs naming but we're 
> considering other labellings aren't we?  It's indirect?

Do you understand how graphs work in n3?  Can I use that to explain?

I'm saying that, in trig, when you see:
   <label> { graph_content }.
it's the same as the n3:
   <label> rdf:hasGraph { graph_content }.

Also, rdf:hasGraph is a functional (maxCard 1) property.

This means you can use any kind of term you want as a label, and still
be talking about a specific graph.   You can think in terms of g-boxes
or g-snaps -- either way works fine.   If <label> denotes a g-box, then
the graph it 'has' is the graph in the box.  If <label> denotes a
g-snap, then the graph it 'has' is itself.  

In stranger applications, if <label> denotes a g-text, then I suppose
the graph it 'has' could be the graph it parses to.    

The cool thing about this approach -- and the only reason to use it
instead of N3, I think -- is that it means the existing SPARQL machinery
becomes, by fiat, as expressive as the N3 graph syntax.   Also, it's
pretty painless to use, I think.

> > So, back to the number analogy: it seems like the reason people want to
> > to do this is because they sometimes want to increase the value of this
> > property.   Later they might want to say it's ten-or-more.  There seems
> > to be some concern that this could be a problem if we said that it was
> > now exactly seven.
> 
> That's "replace" the value, not "extend" it.

?

> RDF graphs can be merged, numbers can't.

In my analogy, merging graphs is like summing numbers.  But it was just
an analogy, and it doesn't seem to be helping, so we can ignore it.

    -- Sandro

[1] http://www.w3.org/2011/rdf-wg/wiki/Graphs_Design_6.1/Crawler_Example
Received on Thursday, 12 April 2012 17:47:46 UTC