Updated JSON-LD normalization and implementations

One of the concerns raised over bnode support in JSON-LD has been the 
complexity of canonically naming bnodes in normalized JSON-LD output. I 
offered a rough, preliminary algorithm for canonicalizing bnode names a 
little while back, but I have since then simplified it. I believe the 
current iteration to be straight-forward to implement. The algorithm 
makes use of memoization techniques to reduce repeated comparisons when 
determining isomorphisms amongst bnodes. In the end, the result is a 
fairly simple recursive algorithm that makes use of 5 functions as 
defined below.

There are now two inter-operable implementations of the algorithm and 
some test cases for it. There is also a Python implementation in the works.

JavaScript: https://github.com/digitalbazaar/forge/blob/master/js/jsonld.js
C++: 
https://github.com/digitalbazaar/monarch/blob/jsonld/cpp/data/json/JsonLd.cpp
Test Cases: https://github.com/digitalbazaar/forge/tree/master/tests/jsonld

The test cases will eventually be refined and merged into a JSON-LD test 
suite repository.

JSON-LD normalization algorithm:

1. Expand input (resolve keywords and CURIEs, etc).
2. Give unique temporary names to unnamed blank nodes.
3. Flatten input (remove embeds, etc.).
4. Create a memo for each bnode for storing bnode comparisons.
5. Sort blank nodes using DeepCompare.
6. Uniquely rename any blank nodes with a c14n prefix.
7. Canonically name blank nodes using DeepName.
8. Sort any property arrays that contain blank nodes.
9. Return all subjects, sorted by IRI, in an array.

DeepCompare(bnodeA, bnodeB):

1. If bnodeB's IRI is in bnodeA's memo, return the memo value.
2. Do ShallowCompare. If the return value is non-zero, store it in 
bnodeA's memo and return.
3. Do DeepCompareEdges on the bnodes' bnode properties with a new 
isomorphism mapping. If the return value is non-zero, store it in 
bnodeA's memo and return.
4. Do DeepCompareEdges on the bnode's bnode references with a new 
isomorphism mapping.
5. Store the return value in bnodeA's memo and return.

ShallowCompare(bnodeA, bnodeB):

1. The bnode with fewer properties is first.
2. The bnode with the alphabetically first property is first.
3. Do CompareObjects for each equivalent property.
4. The bnode with fewer references is first.
5. The bnode with the reference IRI (vs. bnode) is first.
6. The bnode with the alphabetically-first reference IRI is first.
7. The bnode with the alphabetically-first reference property is first.

CompareObjects(bnodeA, bnodeB, property):

1. The bnode with fewer objects is first.
2. The bnode with fewer non-bnode objects is first.
3. The bnode with a string object is first.
4. The bnode with the alphabetically-first string is first.
5. The bnode with a @literal is first.
6. The bnode with the alphabetically-first @literal is first.
7. The bnode with the alphabetically-first @datatype is first.
8. The bnode with a @language is first.
9. The bnode with the alphabetically-first @language is first.
10. The bnode with the alphabetically-first non-bnode @iri is first.

DeepCompareEdges(bnodeA, bnodeB, direction, iso):

1. Iterate over bnodeA's edges in the given direction (property or 
reference).
2. If an adjacent bnode IRI is in the iso mapping, make sure it occurs 
using the same property in bnodeB's edges. If not, the bnodes are not equal.
3. The bnode IRI is not in the iso mapping, so search bnodeB's edges for 
a matching property and adjacent bnode IRI that is also not in the iso 
mapping. If not found, the bnodes are not equal. If found, put the bnode 
IRIs into the iso mapping and run DeepCompare on the bnodes. If 
DeepCompare returns non-zero, remove the bnodes from the iso mapping, 
the bnodes are not equal.
4. If the bnodes are not equal, -1 if bnodeA is less and 1 if bnodeB is 
less. The lesser bnode is the one with the least adjacent bnode for 
bnodeA's edge. Use the same iso mapping when comparing adjacent bnodes. 
If the bnodes are equal, return 0.

DeepName(bnode):

1. If the bnode's prefix is already c14n, return.
2. Rename the bnode using the c14n prefix and incremented counter suffix.
3. Recurisvely rename the bnodes from the bnode's properties in sort order.
4. Recurisvely rename the bnodes from the bnode's references in sort order.

-- 
Dave Longley
CTO
Digital Bazaar, Inc.
http://digitalbazaar.com

Received on Monday, 27 June 2011 17:51:57 UTC