RE: N3 and N-Triples (was: RDF in HTML: Approaches) from Seaborne, Andy on 2002-06-06 (www-rdf-interest@w3.org from June 2002)

From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
Date: Thu, 6 Jun 2002 15:58:02 +0100
To: "'Graham Klyne'" <GK@NineByNine.org>
Cc: "'RDF Interest'" <www-rdf-interest@w3.org>
Message-ID: <5E13A1874524D411A876006008CD059F044B8EC2@0-mail-1.hpl.hp.com>
OK - good idea - concrete example.

Application A is querying a (local) graph.  It is, in some sense, within the
graph because it can get nodes, traverse arcs etc etc.  Auery is returning
statements and resources, including bNodes.  The application can use the
results from one query or graph access to drive another graph access by
having the bNodes flow from the results of one query into the next.  Usual
sort of RDF API stuff.

Now, suppose application A wants to do the same thing with a RDF graph that
is on another machine.  It is a large remote graph - ideally application A
wants to interface to the graph via programmatic access, not read the whole
large graph over locally to be used.

It would be nice if there were someway to construct an infrastructure that
can do this over the web.  The information that has to cross the net is such
that the application sees the same ability to query, travsers, and
manipulate the graph as it did locally. To do this, the on-the-wire form has
to contain information to pass parts of the graph over the wire, including
round-tripping things like bNodes.  Local isomorphism at bNodes of graphs
isn't enough, especially when it comes to update.

It seems to me that a serialization of an RDF that could capture the graph
(shared bNodes and all) would be useful here.  MIDs would help.

	Andy

-----Original Message-----
From: Graham Klyne [mailto:GK@NineByNine.org] 
Sent: 6 June 2002 14:52
To: Andy Seaborne
Cc: 'RDF Interest'
Subject: RE: N3 and N-Triples (was: RDF in HTML: Approaches)


At 01:18 PM 6/6/02 +0100, Andy Seaborne wrote:
> > the model theory is quite clear that
> > bnodes are not identified with anything outside the graph in which
>they
> > appear
>
>This is the key to me.  My understanding os the scope of the graph is
>limited.

If you mean that the graph defines a limited scope for a bnode, yes I agree.

(I'm not sure what it means to say the scope of a graph.)

>If I have an in-memory graph, and I write it to a serialized form then
>read it in again (same machine or different machine, same process, 
>different process), why do I get a different set of bNodes?  I guess 
>this is asking whether the graph-in-the-file is the same 
>graph-in-the-memory.  Never did understand this.

I guess that is the question:  are they truly the same graph, or are they 
two graphs that happen (at some time) to be isomorphic?  A couple of ways 
of thinking about this occur to me:

(a) model theory:  can the different presentations be simultaneously 
contemplated under different interpretations?   If so, I'd suggest they are 
different.

(b) mutation:  if one of the presentations is updated, does that update 
propagate to the other presentation?  If not, they are different.

>It is a real nuisance when using RDF graph over the network where the
>application is using a graph on a different machine.  They are talking 
>about the same graph.  But they can't.  Unless I use language dependent 
>RPC!

Well, a graph is just syntax - a description of some presumed reality.  Do 
they describe the same reality?  Do they mutually entail?  That's what 
ultimately matters, I think.

Having different bnodes in different graph instances in no way weakens 
mutual entailment between two graphs:  if the graphs are otherwise 
identical, the interpretations that satisfy one are exactly the 
interpretations that satisfy the other.

> > if you start introducing identifiers that describe bNodes from
> > "outside", you (a) need to have a way of scoping them to a 
> > particular
>graph
> > instance, or (b) be very sure that they are unique.
>
>Both (a) and (b) could be done as backwards compatable RDF syntax but
>it is a change to the syntax.  e.g. for (b) a syntax that is 
>"bnode@<uuid>" This is not pretending to be a URI - the space of URIs 
>and this space are disjoint.  It is just a syntactic labelling of 
>variables for the purposes of serialization.

Sure... I wasn't suggesting this be done, just trying to explain why 
introducing such external identifiers was problematic.

> > "minimal identifying description" (MID)
>
>Seems fine but lets go the whole way and have the URI for a node as a
>property as a MID :-)

This takes us into the whole Skolemization debate, which others explain far 
better than I.  E.g. discussions of Skolemization in 
http://www.w3.org/TR/rdf-mt/.  Note how carefully the Skolemization lemma 
has to be stated to be logically valid.

>When processing the RDF I find that strictly I need to handle bNodes
>with isomorphism tests in the absense of the such MIDs.  Labelled nodes 
>have an MID called their URI.

The problem here is that two isomorphic graphs containing bnodes just do 
not (in general) contain the information that corresponding bnodes denote 
the same value.  Adding URIs to the bnodes in graph imposes such a result.

I don't think we're going to make a lot more progress on this debate 
without being more specific about exactly what it is that you want to
achieve.

#g
--

PS:  one possible "out" occurs to me:  if graphs themselves are considered 
resources that can be labelled with URIs (e.g. like formulae in N3), then 
we could assert that two graph presentations with the same URI were indeed 
the very same graph.  Then, the graphs must be isomorphic, or we have a 
nonsense (any graph must be isomorphic with itself, right?).  And then it 
is reasonable to say that the corresponding bnodes under graph isomorphism 
are indeed the same node.

The prime difficulty with this that I see is how to account for two graph 
presentations with the same URI that are not isomorphic:  reject it as 
nonsense (unsatisfiable)?  introduce a more subtle account of how 
presentations relate to the underlying graph (but how then to determine 
isomorphism)?  I think there could be a rathole here.


>-----Original Message-----
>From: Graham Klyne [mailto:GK@NineByNine.org]
>Sent: 6 June 2002 12:04
>To: Seaborne, Andy
>Cc: 'RDF Interest'
>Subject: RE: N3 and N-Triples (was: RDF in HTML: Approaches)
>
>
>At 10:51 AM 6/6/02 +0100, Seaborne, Andy wrote:
> >If an RDF processor reads in the same file twice, are the bNodes the
> >same or different?
>
>I'd say "different".
>
> >For compatibility with current RDF syntax, implicit bNodes in the
> >current syntax yield different bnodes in the graph created.  But 
> >there is a choice as to whether an explicit bNode (one labeled in the 
> >syntax)
>
> >is scoped to the file read operation (and hence creates different
> >bNodes) or whether they get unique labels in the disjoint space.
> >
> >If RDF is to be exchanged between systems across a newtork using a
> >serialization then the latter is desirable.  It means part of the 
> >system (an RDF application) on one machine can talk about the bNodes 
> >on
>
> >another machine (the source of the graph).
>
>That sounds rather dodgy to me -- the model theory is quite clear that
>bnodes are not identified with anything outside the graph in which they 
>appear -- if you start introducing identifiers that describe bNodes 
>from
>
>"outside", you (a) need to have a way of scoping them to a particular
>graph instance, or (b) be very sure that they are unique.
>
>Because of the way that bNode semantics are defined (essentially, as
>existential variables), I don't think it really matters if you have 
>different bnodes in different places as long as the associated 
>statements about them are "isomorphic" -- there's some recent 
>discussion in the DAML
>list about "minimal identifying description" (MID) between Richard Fikes
>
>and Peter Patel-Schneider that might have some bearing.   I don't know
>where the web archive is, but look for messages starting about:
>
>[[
>Date: Fri, 24 May 2002 15:39:41 -0700
>From: Richard Fikes <fikes@ksl.stanford.edu>
>To: Joint Committee <joint-committee@daml.org>
>Subject: New DQL Specification
>Content-Type: multipart/mixed;
>   boundary="------------C8A05097584B9E8F59A89C7A"
>]]
>
>#g
>
>
>-------------------
>Graham Klyne
><GK@NineByNine.org>

-------------------
Graham Klyne
<GK@NineByNine.org>
Received on Thursday, 6 June 2002 10:58:17 UTC