W3C home > Mailing lists > Public > public-linked-json@w3.org > October 2011

Re: Stream-based processing!?

From: Olivier Grisel <olivier.grisel@ensta.org>
Date: Mon, 3 Oct 2011 14:09:37 +0200
Message-ID: <CAFvE7K5sn8+yiH-SMXYN1phS1iAwE1UfFaXMG2PbBOono_vdcA@mail.gmail.com>
To: Ivan Herman <ivan@w3.org>
Cc: Markus Lanthaler <markus.lanthaler@gmx.net>, public-linked-json@w3.org
2011/10/3 Ivan Herman <ivan@w3.org>:
>
> On Oct 3, 2011, at 13:51 , Olivier Grisel wrote:
>
>> 2011/10/3 Ivan Herman <ivan@w3.org>:
>>>
>>> However... the standard JSON parser in Python parses a JSON object into a dictionary. However, at least in Python, you cannot rely on the order of the keys within the dictionary (it is determined by some hashing algorithm, if I am not mistaken, but that is internal to the interpreter anyway). Ie, whether @context appears first or last does not make any difference.
>>>
>>> Worse: if you then use such a structure to generate JSON using again the 'dump' feature of the standard Python parser, there is no way to control the order of those keys. In other words, if we impose such an order in JSON-LD, that means that a Python programmer must bypass the standard JSON library module and do the dump by hand. I do not think that would be acceptable...
>>
>> In python 2.7 and 3.2+ it is possible to have a deterministic order by
>> using the collections.OrderedDict class from the standard library. In
>> that case the json.dump will respect that order. At parsing time it is
>> now possible to pass the OrderedDict class as "object_pairs_hook" to
>> avoid loosing the ordering information.
>>
>>  http://docs.python.org/library/json.html
>
> Thank you, I did not know that. But (for those who are not Python users) this means that this is a feature in the very latest version of the two python branches, ie, it would force all JSON/Python users to upgrade to the latest and finest which is not always possible for everybody. Ie, I still perceive that as a problem:-(

It is still easy to reuse an existing OrderedDict python or C
implementation and use it in a JSON-LD lib to ensure that the
serialization will put the @context node in first position.

People have been doing that kind of thing for many years before
OrderedDict made it to the standard lib, for instance the Django
framework has an implementation (for the purpose of deterministic JSON
ordering among other stuff). Also see other alternative
implementations mentioned in:

   http://www.python.org/dev/peps/pep-0372/#reference-implementation

For parsing, preserving order is a nice to have, but not necessary in
most cases.

It is however crucial to have the @context node first if you plan to
implement an optimized streaming / sax-style / pull-parser for JSON-LD
such as discussed here (in java I use jackson to do this for
instance):

  http://stackoverflow.com/questions/444380/is-there-a-streaming-api-for-json

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Received on Monday, 3 October 2011 12:10:27 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:35 GMT