Re: Stream-based processing!?

On Oct 3, 2011, at 13:51 , Olivier Grisel wrote:

> 2011/10/3 Ivan Herman <ivan@w3.org>:
>> 
>> However... the standard JSON parser in Python parses a JSON object into a dictionary. However, at least in Python, you cannot rely on the order of the keys within the dictionary (it is determined by some hashing algorithm, if I am not mistaken, but that is internal to the interpreter anyway). Ie, whether @context appears first or last does not make any difference.
>> 
>> Worse: if you then use such a structure to generate JSON using again the 'dump' feature of the standard Python parser, there is no way to control the order of those keys. In other words, if we impose such an order in JSON-LD, that means that a Python programmer must bypass the standard JSON library module and do the dump by hand. I do not think that would be acceptable...
> 
> In python 2.7 and 3.2+ it is possible to have a deterministic order by
> using the collections.OrderedDict class from the standard library. In
> that case the json.dump will respect that order. At parsing time it is
> now possible to pass the OrderedDict class as "object_pairs_hook" to
> avoid loosing the ordering information.
> 
>  http://docs.python.org/library/json.html

Thank you, I did not know that. But (for those who are not Python users) this means that this is a feature in the very latest version of the two python branches, ie, it would force all JSON/Python users to upgrade to the latest and finest which is not always possible for everybody. Ie, I still perceive that as a problem:-(

Ivan

> 
> So I don't think this is such as use deal to enforce the @context node
> as first position. But that will require a bit of communication effort
> for documenting and advertising such good practices to JSON-LD library
> developers.
> 
> IMHO it is very interesting to be able to do one pass / streaming
> processing of huge JSON-LD dumps without having to load the payload in
> memory.
> 
> For instance I would really like to be able to have JSON-LD dumps of
> the full DBpedia that I could pre-filter in one-pass before loading it
> to a CouchDB database or and ElasticSearch fulltext index. Such a dump
> JSON-LD would be several tens of GB uncompressed and would probably
> not fit in today computers' main memory.
> 
> -- 
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Monday, 3 October 2011 11:57:04 UTC