Re:Matching RDF models + anon nodes from Stefan Kokkelink on 2001-07-18 (www-rdf-interest@w3.org from July 2001)

From: Stefan Kokkelink <skokkeli@mathematik.uni-osnabrueck.de>
Date: Wed, 18 Jul 2001 11:00:17 +0200
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
CC: www-rdf-interest@w3.org
Message-ID: <3B555021.804A9382@mathematik.uni-osnabrueck.de>
Hi Jeremy,

I like your approach using standard algorithms from
graph theory. I will read the paper in detail and
come back to that later.

I would like RDFCore to recognize this approach.
It shows how one can leverage existing graph 
theory.

However, there is no formal definition of RDF graphs
in the specification. (I made a quick shot at [1]).
From section 5: 

"This specification shows three representations
of the data model; as 3-tuples (triples), as a graph, 
and in XML. These representations have equivalent
meaning."

That means we have four things: 
1) data model
2) triples
3) graphs
4) XML

As mentioned before [2] the data model is *not* just
a set of triples. Triples are just one representation
of the data model.

There are two basic problems:

1. Only one of these representaions is formally 
   defined: XML. 

2. What does "These representations have equivalent
   meaning." really mean?

My personal view on both:

1. All of these representations should be formally
   defined in the RDF specification. I think one should 
   use NTriples to formally define 'triples'. But one should
   also formally define RDF graphs! I would like to offer 
   help here.

2. There should be explicitly given mappings (in a mathematical
   sense) between the representations. (Currently, there is only
   one: from XML to triples.). The sentence "These representations 
   have equivalent meaning." should be changed to "There are
well-defined
   mappings between the representations".

   RDFCore must decide if these representaions should really be
   "equivalent" in the sense that every term in one representation
   must be expressible in all others. If yes, then  the data model
   is redundant and can be ommited. It would be implicitly given by
   the mappings which would be bijections in this case. If no, it
   should be explicitly mentioned which terms of the data model
   can be expressed in a given representation.

   Example: A resource is part of the data model, but can't 
   be expressed in the triple representation. A resource can
   be expressed in XML: 
   <rdf:Description about="URI"/></rdf:Decription>
   and in the graph representation. A literal is part of
   the data model, but can't be expressed in XML (and in the
   triple representation). A literal can be expressed in a 
   graph.

Regards,
Stefan

[1]
http://lists.w3.org/Archives/Public/www-rdf-interest/2001Jun/0008.html 
[2]
http://lists.w3.org/Archives/Public/www-rdf-interest/2001Jul/0028.html
 
 
Jeremy Carroll wrote:
> 
> One of the improvements in Jena-1-1-0
> http://www-uk.hpl.hp.com/people/bwm/rdf/jena/
> is a matching algorithm that can tell if two models are the same.
> 
> The algorithm aligns the anonymous resources; so that two files, identical
> except for the order of statements will compare equal.
> 
> I've written up the algorithm used, the first draft is available at:
> 
> http://www-uk.hpl.hp.com/people/jjc/tmp/matching.pdf
> 
> It's based on a standard algorithm from graph theory.
> 
> It could also be useful for deeper notions of equivalence (e.g. after we
> have decided that certain pairs of URI's actually refer to the same
> resource).
> 
> Any feedback, including stuff like typos and spelling errors, as well as
> more profound comments, would be welcome. I plan to take the doc to a second
> final version in three weeks time, when I will post a technical report
> number and a non-transitory URL.
> 
> enjoy
> 
> Jeremy Carroll
> HP Labs
Received on Wednesday, 18 July 2001 05:15:15 UTC