JSON-LD serialisation and RDF lists from Hugo Mills on 2019-05-03 (public-json-ld-wg@w3.org from May 2019)

From: Hugo Mills <hugo@carfax.org.uk>
Date: Fri, 3 May 2019 16:56:35 +0000
To: public-json-ld-wg@w3.org
Message-ID: <20190503165635.GE5426@carfax.org.uk>
   Hi,

   I hope this is the right place to ask this. Please advise if it's
not appropriate here.

   I've been trying to write a JSON-LD serialiser, and I've hit an
issue with my code intermittently failing one of the test cases. I
wanted to check whether the problem is actually with my code, or if
it's an ambiguity in the specification.

   The serialisation algorithm in §8.4.2 of the draft JSON-LD 1.1
Algorithms spec seems to produce different results for nested lists,
depending on the order chosen for iterating over the usages of the
rdf:nil node, in part 5.3.

   My understanding of the algorithm is that it find things that look
like an rdf:List, and rolls up the list from the end, while a fairly
large number of conditions (5.3.3) is true. This is relatively easy to
deal with in the case of a single list, and there are lots of test
cases dealing with the various termination conditions. The one I'm
having trouble with is t0008:

<http://example.com> <http://example.com/property> _:outerlist .
_:outerlist <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:lista .
_:outerlist <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:b0 .

_:lista <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "a1" .
_:lista <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:a2 .
_:a2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "a2" .
_:a2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:a3 .
_:a3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "a3" .
_:a3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .

_:c0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:c1 .
_:c0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
_:c1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "c1" .
_:c1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:c2 .
_:c2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "c2" .
_:c2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:c3 .
_:c3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "c3" .
_:c3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .

_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:b1 .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:c0 .
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "b1" .
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:b2 .
_:b2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "b2" .
_:b2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:b3 .
_:b3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "b3" .
_:b3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .

which is, in Turtle, fundamentally this graph:

<http://example.com> <http://example.com/property>
     (("a1" "a2" "a3")
      ("b1" "b2" "b3")
      ("c1" "c2" "c3")).

   Now, my implementation of the algorithm behaves differently,
depending on whether it decides to roll up any of the sub-lists before
or after it decides to roll up the parent list. If it does the parent
list first, I get the golden output for test 0008:

[
    ...
  {
    "@id": "_:lista",
    "http://www.w3.org/1999/02/22-rdf-syntax-ns#first": [ { "@value": "a1" } ],
    "http://www.w3.org/1999/02/22-rdf-syntax-ns#rest": [
      {
        "@list": [
          { "@value": "a2" },
          { "@value": "a3" }
        ]
      }
    ]
  },
  {
    "@id": "http://example.com",
    "http://example.com/property": [
      {
        "@list": [
          { "@id": "_:lista" },
          { "@id": "_:b1" },
          { "@id": "_:c1" }
        ]
      }
    ]
  }
]

If it does the parent list last, I end up generating the doubly-nested
structure:

["@id": "http://example.com",
 "http://example.com/property": [{"@list":
                                  [{"@list":
                                    [{"@value": "a1"},
                                     {"@value": "a2"},
                                     {"@value": "a3"}]},
                                   {"@list":
                                    [{"@value": "b1"},
                                     {"@value": "b2"},
                                     {"@value": "b3"}]},
                                   {"@list":
                                    [{"@value": "c1"},
                                     {"@value": "c2"},
                                     {"@value": "c3"}]
                                  }]
                                }]
]

If the parent list comes in the middle, I get a hybrid between the
two.

   My problem is that I can't see where in the algorithm this
ambiguous behaviour is prevented. There's nothing that I can see which
stops the first bnode of a list being rolled into the list if it's
also an element in another list. I also can't see anything which
guarantees that higher-level lists are processed before lower-level
lists, which should also generate the correct output for that test.

   What have I missed here?

   [For information, I'm implementing this in Erlang, so I've also
been having trouble converting from a very procedural,
global-data-and-mutable-variables view on the world into a functional,
immutable-variables view. This may be contributing to any
misinterpretation of the algorithm as documented.]

   Thanks,
   Hugo.

-- 
Hugo Mills             | Great films about cricket: The Third Man
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |
Received on Friday, 3 May 2019 20:54:48 UTC