RE: N3 and N-Triples (was: RDF in HTML: Approaches) from Graham Klyne on 2002-06-06 (www-rdf-interest@w3.org from June 2002)

From: Graham Klyne <GK@ninebynine.org>
Date: Thu, 06 Jun 2002 17:00:03 +0100
To: "Seaborne, Andy" <Andy_Seaborne@hplb.hpl.hp.com>
Cc: "'RDF Interest'" <www-rdf-interest@w3.org>
Message-Id: <5.1.0.14.2.20020606162534.03aa9bb0@joy.songbird.com>
Andy,

Good, I see.  I think the DAML debate is chasing a similar, or related, 
issue (but I've not followed it at all closely.)   (Or maybe not:  I think 
the MID may not work here, because it's not clear how it handles remote 
updates to the graph -- this is more than just a query environment.)

Conceptually, I think that what you're trying to do is create a distributed 
representation of a single graph, so in that context I think the arguments 
for having URIs for bnodes don't need to be faced head-on.  Or, put another 
way, the requirement is to create a bnode scope that encompasses both A and 
the remote graph (let's call it R).  Anything that is stored by A could be 
taken as a projection out of R, but still part of R, not a separate 
graph.  I think there are some constraints that must be honoured, like any 
update in A is reflected in R, and that A and R cannot be assumed 
simultaneously to be under different interpretations.  This means that A 
cannot be treated as a complete graph in isolation.

So if this works conceptually, what to do in practice?

Thinking of my "out", maybe give R a URI and have A use that.  bnodes can 
have local identifiers that are qualified by the graph URI but are 
meaningless when treated separately.  Software in A that processes the 
(partial) graph must operate to ensure coherency with the original copy.

Is this making any sense...?

>It seems to me that a serialization of an RDF that could capture the graph
>(shared bNodes and all) would be useful here.  MIDs would help.

I'd suggest, rather, that you want a subgraph serialization that
(a) refers back to the master graph (probably by URI?), and
(b) has provision for local identifiers that are meaningless when isolated 
from the graph/subgraph context.

E.g. something like:

   <subgraph:RDF subgraph:ID='graph-URI'>
    :
   </subgraph:RDF>

which is much like <rdf:RDF> except for the subgraph:ID attribute, and that 
its RDF-form contents may use an additional attribute in place of rdf:ID or 
rdf:about:

   <rdf:Description subgraph:bnode='opaque-id'>
    :
   </rdf:Description>

(If I've understood recent RDF discussions correctly, standard RDF will 
ignore this attribute and simply interpret the description as a blank node.)

What I've sketched is clearly not standard RDF (though it bears a close 
relationship).  I think this is fine, because it's essentially a private 
agreement between the software components that are implementing a 
distributed RDF graph representation.  Maybe, in time, the distributed 
graph representation is useful enough that multivendor interoperability is 
desired, and hence standardization.  (In this case, the standardization may 
well need to include some protocol elements for dealing with maintaining 
coherence.)

The key point in all this is that the goal you have outlined is, I think, 
an extension to standard RDF capabilities and should not be shoehorned into 
standard RDF.

There are many other things I could say about this, but I think they'd 
obscure the basic idea.  So I'll stop here.

#g
--

At 03:58 PM 6/6/02 +0100, Seaborne, Andy wrote:


>OK - good idea - concrete example.
>
>Application A is querying a (local) graph.  It is, in some sense, within the
>graph because it can get nodes, traverse arcs etc etc.  Auery is returning
>statements and resources, including bNodes.  The application can use the
>results from one query or graph access to drive another graph access by
>having the bNodes flow from the results of one query into the next.  Usual
>sort of RDF API stuff.
>
>Now, suppose application A wants to do the same thing with a RDF graph that
>is on another machine.  It is a large remote graph - ideally application A
>wants to interface to the graph via programmatic access, not read the whole
>large graph over locally to be used.
>
>It would be nice if there were someway to construct an infrastructure that
>can do this over the web.  The information that has to cross the net is such
>that the application sees the same ability to query, travsers, and
>manipulate the graph as it did locally. To do this, the on-the-wire form has
>to contain information to pass parts of the graph over the wire, including
>round-tripping things like bNodes.  Local isomorphism at bNodes of graphs
>isn't enough, especially when it comes to update.
>
>It seems to me that a serialization of an RDF that could capture the graph
>(shared bNodes and all) would be useful here.  MIDs would help.
>
>         Andy
>
>-----Original Message-----
>From: Graham Klyne [mailto:GK@NineByNine.org]
>Sent: 6 June 2002 14:52
>To: Andy Seaborne
>Cc: 'RDF Interest'
>Subject: RE: N3 and N-Triples (was: RDF in HTML: Approaches)
>
>
>At 01:18 PM 6/6/02 +0100, Andy Seaborne wrote:
> > > the model theory is quite clear that
> > > bnodes are not identified with anything outside the graph in which
> >they
> > > appear
> >
> >This is the key to me.  My understanding os the scope of the graph is
> >limited.
>
>If you mean that the graph defines a limited scope for a bnode, yes I agree.
>
>(I'm not sure what it means to say the scope of a graph.)
>
> >If I have an in-memory graph, and I write it to a serialized form then
> >read it in again (same machine or different machine, same process,
> >different process), why do I get a different set of bNodes?  I guess
> >this is asking whether the graph-in-the-file is the same
> >graph-in-the-memory.  Never did understand this.
>
>I guess that is the question:  are they truly the same graph, or are they
>two graphs that happen (at some time) to be isomorphic?  A couple of ways
>of thinking about this occur to me:
>
>(a) model theory:  can the different presentations be simultaneously
>contemplated under different interpretations?   If so, I'd suggest they are
>different.
>
>(b) mutation:  if one of the presentations is updated, does that update
>propagate to the other presentation?  If not, they are different.
>
> >It is a real nuisance when using RDF graph over the network where the
> >application is using a graph on a different machine.  They are talking
> >about the same graph.  But they can't.  Unless I use language dependent
> >RPC!
>
>Well, a graph is just syntax - a description of some presumed reality.  Do
>they describe the same reality?  Do they mutually entail?  That's what
>ultimately matters, I think.
>
>Having different bnodes in different graph instances in no way weakens
>mutual entailment between two graphs:  if the graphs are otherwise
>identical, the interpretations that satisfy one are exactly the
>interpretations that satisfy the other.
>
> > > if you start introducing identifiers that describe bNodes from
> > > "outside", you (a) need to have a way of scoping them to a
> > > particular
> >graph
> > > instance, or (b) be very sure that they are unique.
> >
> >Both (a) and (b) could be done as backwards compatable RDF syntax but
> >it is a change to the syntax.  e.g. for (b) a syntax that is
> >"bnode@<uuid>" This is not pretending to be a URI - the space of URIs
> >and this space are disjoint.  It is just a syntactic labelling of
> >variables for the purposes of serialization.
>
>Sure... I wasn't suggesting this be done, just trying to explain why
>introducing such external identifiers was problematic.
>
> > > "minimal identifying description" (MID)
> >
> >Seems fine but lets go the whole way and have the URI for a node as a
> >property as a MID :-)
>
>This takes us into the whole Skolemization debate, which others explain far
>better than I.  E.g. discussions of Skolemization in
>http://www.w3.org/TR/rdf-mt/.  Note how carefully the Skolemization lemma
>has to be stated to be logically valid.
>
> >When processing the RDF I find that strictly I need to handle bNodes
> >with isomorphism tests in the absense of the such MIDs.  Labelled nodes
> >have an MID called their URI.
>
>The problem here is that two isomorphic graphs containing bnodes just do
>not (in general) contain the information that corresponding bnodes denote
>the same value.  Adding URIs to the bnodes in graph imposes such a result.
>
>I don't think we're going to make a lot more progress on this debate
>without being more specific about exactly what it is that you want to
>achieve.
>
>#g
>--
>
>PS:  one possible "out" occurs to me:  if graphs themselves are considered
>resources that can be labelled with URIs (e.g. like formulae in N3), then
>we could assert that two graph presentations with the same URI were indeed
>the very same graph.  Then, the graphs must be isomorphic, or we have a
>nonsense (any graph must be isomorphic with itself, right?).  And then it
>is reasonable to say that the corresponding bnodes under graph isomorphism
>are indeed the same node.
>
>The prime difficulty with this that I see is how to account for two graph
>presentations with the same URI that are not isomorphic:  reject it as
>nonsense (unsatisfiable)?  introduce a more subtle account of how
>presentations relate to the underlying graph (but how then to determine
>isomorphism)?  I think there could be a rathole here.
>
>
> >-----Original Message-----
> >From: Graham Klyne [mailto:GK@NineByNine.org]
> >Sent: 6 June 2002 12:04
> >To: Seaborne, Andy
> >Cc: 'RDF Interest'
> >Subject: RE: N3 and N-Triples (was: RDF in HTML: Approaches)
> >
> >
> >At 10:51 AM 6/6/02 +0100, Seaborne, Andy wrote:
> > >If an RDF processor reads in the same file twice, are the bNodes the
> > >same or different?
> >
> >I'd say "different".
> >
> > >For compatibility with current RDF syntax, implicit bNodes in the
> > >current syntax yield different bnodes in the graph created.  But
> > >there is a choice as to whether an explicit bNode (one labeled in the
> > >syntax)
> >
> > >is scoped to the file read operation (and hence creates different
> > >bNodes) or whether they get unique labels in the disjoint space.
> > >
> > >If RDF is to be exchanged between systems across a newtork using a
> > >serialization then the latter is desirable.  It means part of the
> > >system (an RDF application) on one machine can talk about the bNodes
> > >on
> >
> > >another machine (the source of the graph).
> >
> >That sounds rather dodgy to me -- the model theory is quite clear that
> >bnodes are not identified with anything outside the graph in which they
> >appear -- if you start introducing identifiers that describe bNodes
> >from
> >
> >"outside", you (a) need to have a way of scoping them to a particular
> >graph instance, or (b) be very sure that they are unique.
> >
> >Because of the way that bNode semantics are defined (essentially, as
> >existential variables), I don't think it really matters if you have
> >different bnodes in different places as long as the associated
> >statements about them are "isomorphic" -- there's some recent
> >discussion in the DAML
> >list about "minimal identifying description" (MID) between Richard Fikes
> >
> >and Peter Patel-Schneider that might have some bearing.   I don't know
> >where the web archive is, but look for messages starting about:
> >
> >[[
> >Date: Fri, 24 May 2002 15:39:41 -0700
> >From: Richard Fikes <fikes@ksl.stanford.edu>
> >To: Joint Committee <joint-committee@daml.org>
> >Subject: New DQL Specification
> >Content-Type: multipart/mixed;
> >   boundary="------------C8A05097584B9E8F59A89C7A"
> >]]
> >
> >#g
> >
> >
> >-------------------
> >Graham Klyne
> ><GK@NineByNine.org>
>
>-------------------
>Graham Klyne
><GK@NineByNine.org>

-------------------
Graham Klyne
<GK@NineByNine.org>
Received on Thursday, 6 June 2002 11:47:13 UTC