RDFa API object equality and canonicalization of RDF Nodes

Hi All,

We haven't catered for object equality in the RDFa API yet, thus I 
propose we augment the RDFNode interface with an equals method:

interface RDFNode {
     readonly attribute stringifier DOMString value;
     boolean equals( in RDFNode otherNode );

There are a lot of subtleties to equality, especially in the domains of 
javascript and RDF, so I thought it best to catalogue them all in a 
single mail.

First of all, in javascript (and many other languages) two objects (or 
variables containing objects) are only equal if they are a reference to 
the object, that is to that the following two objects are *not* equal:

   var a = document.data.createIRI('http://example.org/');
   var b = document.data.createIRI('http://example.org/');
   a == b; // false

Javascript (and some other languages) do implement type inference in the 
native equality implementation, for example if an object has a 
toString() method and is compared to a native string then the 
stringified form of the object is used for comparison:

   var a = document.data.createIRI('http://example.org/');
   var b = 'http://example.org/';
   a == b; // true

And bringing RDF in to the equation only compounds matters, the subtlety 
above is that we've just compared an RDFResource to a String Literal URI 
and received a false positive because we didn't consider canonicalization.

Thus, we will need to cater for this by adding an equality method to 
RDFNode, and by stipulating that it *must* compare canonicalized values 
of RDFNodes.

Which, by inference requires us to define the canonical form of IRI, 
BlankNode, PlainLiteral, TypedLiteral and quite possible RDFTriple too, 
since none of the existing toString or toValue methods are normalized 
(they do not expose the type or language via toString/toValue and as 
above you cannot compare the objects directly).

Typically I'd suggest that the canonicalized form of any RDFNode (+ 
RDFTriple) is it's N-Triples string value.

I also feel that the details of implementing this in the specification 
would be somewhat easier to understand if we added a toNT() method to 
RDFNode and RDFTriple and asserted that the implementation of the 
equals() method should simple call .toNT() on both objects and compare 
the (canonicalized string value) return.

I'm aware there has been some hesitance to have a clear dependency on N 
Triples (which we already have) and to expose a toNT method, however 
with the above considered then I strongly feel we should expose it, 
trying to describe how to do comparison/equality operations and 
canonicalization will be exponentially more difficult for users, us, and 
  implementers if we don't.

Finally, there is one other detail missing which affects the API, namely 
that none of our RDFNodes are typed, at runtime there is no way to tell 
if an IRI is an IRI or if a BlankNode is a BlankNode, and likewise no 
way to practically distinguish between them, more over you can't even 
tell if they are an RDFNode at all, they could be any old object.

This typing issue raises it's head in two places, first when 
implementing an optimal object equality method, and second - and indeed 
far more importantly - when implementing a Serializer.

Thus I would suggest that we need to add either a method or an attribute 
which exposes a name or identifier for the RDF Interface which is 
implemented; aside - would aligning with DOM and exposing a nodeType 
attribute be sensible, or would it confuse & conflate?

ps: I also feel we should add a DataSerializer interface to compliment 
DataParser which has a single method `string serialize(in DataStore 
store);` - any objections?



Received on Saturday, 9 October 2010 04:11:59 UTC