Re: JSON-LD bnode canonical naming algorithm from Manu Sporny on 2011-06-02 (public-linked-json@w3.org from June 2011)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Wed, 01 Jun 2011 21:46:44 -0400
To: "public-linked-json@w3.org" <public-linked-json@w3.org>
Message-ID: <4DE6EB84.6080000@digitalbazaar.com>
On 05/30/2011 07:57 PM, glenn mcdonald wrote:
> or simply author using URIs for nodes having multiple references.
> 
> +1
> 
> The idea that there even /needs/ to be a "bnode canonical naming
> algorithm" seems to me close to proof that blank nodes should be dropped
> from JSON-LD. And from LD, period. And from RDF...

I know this statement is often made half-jokingly, but we should take a
second to analyze it because I've heard it several times before and it's
rarely taken seriously. So, let's take it seriously this time. :)

As much as the bnode processing frustration resonates with all of us -
trust me, we would rather not deal with it either - we cannot just
accept the statement that because we need a complex canonicalization
algorithm that it therefore follows that bnodes should be dropped from
Linked Data. Canonicalization is often a complex operation - graph
canonicalization even more so. If we can remove the requirement for
bnodes and not lose anything, that would be fantastic. Unfortunately,
the fact is that when we do not support bnodes, we do lose something
very important.

A little background:

When we started using JSON-LD normalization to support digital
signatures for the PaySwarm work,

http://payswarm.com/

we made the assertion that we would not support canonicalization that
involved named bnodes with multiple references. This made the
canonicalization algorithm simpler and we went on our merry way.
Unfortunately, this is not a general solution for the Web - we cannot
enforce the same restrictions on general graph processing. We cannot
tell developers that they need to name everything that could potentially
be referenced twice. In many cases, they don't know if the bnode will be
referenced more than once. Think of the following statements:

I know somebody who is named Bob.
I am related to somebody that has high cholesterol
I know somebody, that likes somebody who owns a dog.
What is that thing that has a tail with spikes in it?
Not to mention that the six degress of Kevin Bacon game would be
impossible to represent in Linked JSON!

My point is that you cannot tell the rest of the developers of the Web
that they cannot have unnamed objects. It is true that most of them will
not need it, however, it is also true that there are problems that
cannot be solved without bnodes. BNodes are hard on implementers,
they're not necessarily hard on developers that need them.

So, yes, the bnode canonicalization algorithm may be difficult - but the
advantages of having it far outweigh the implementation burden.

We think we have found a generalized algorithm for graph
canonicalization that would not only apply to JSON-LD, but to all RDF
languages. This is a /big deal/ - we have never had a standard way of
doing this before. This would allow us to do graph equivalence in a
standard way and that would be a huge win for the class of people that
need to perform those operations. For example, the people that need to
be able to digitally sign a graph, to name just one very important use case.

For the people that don't care about graph normalization, it doesn't
affect how they use Linked JSON. For the people that /do/ care about
graph normalization it opens up a whole new world of possibilities.
Being able to compare graphs for equality is a fundamental primitive
required for many types of software systems - digital signatures, graph
sorting, etc.

Furthermore, supporting graph normalization allows some of those using
Linked JSON to not have to have a triple store and SPARQL in order to
work with the data. For example, having graph canonicalization would
allow us to not require a triple store and SPARQL for the JSON-LD test
suite. Instead, whether you pass a particular test or not would be a
simple canonicalization and string compare to see if your processor
matches the expected output. In the end, isn't one algorithm simpler
than a stack of software supporting a triple store + SPARQL processor?

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: PaySwarm Developer Tools and Demo Released
http://digitalbazaar.com/2011/05/05/payswarm-sandbox/
Received on Thursday, 2 June 2011 01:47:21 UTC