Re: Stream-based processing!? from Gregg Kellogg on 2011-10-03 (public-linked-json@w3.org from October 2011)

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Mon, 3 Oct 2011 12:14:12 -0400
To: Ivan Herman <ivan@w3.org>
CC: Markus Lanthaler <markus.lanthaler@gmx.net>, "public-linked-json@w3.org" <public-linked-json@w3.org>
Message-ID: <5930485B-EA2E-4C11-BFD1-6975170C082B@greggkellogg.net>

On Oct 3, 2011, at 4:01 AM, Ivan Herman wrote:

> 
> On Oct 2, 2011, at 22:33 , Markus Lanthaler wrote:
>> 
>> 
>>>> We could also require serializations ensure that @context is listed
>>>> first. If it isn't listed first, the processor has to save each
>>>> key-value pair until the @context is processed. This creates a memory
>>>> and complexity burden for one-pass processors.
>> 
>> Agree. I think that would make a lot of sense since you can see the context
>> as a kind of header anyway.
> 
> I must admit I do not really understand that, but that probably shows my ignorance of the wider JSON world.
> 
> However... the standard JSON parser in Python parses a JSON object into a dictionary. However, at least in Python, you cannot rely on the order of the keys within the dictionary (it is determined by some hashing algorithm, if I am not mistaken, but that is internal to the interpreter anyway). Ie, whether @context appears first or last does not make any difference. 
> 
> Worse: if you then use such a structure to generate JSON using again the 'dump' feature of the standard Python parser, there is no way to control the order of those keys. In other words, if we impose such an order in JSON-LD, that means that a Python programmer must bypass the standard JSON library module and do the dump by hand. I do not think that would be acceptable...

I think we're conflating the in-memory representation of an object from the serialized form of it. This may be an issue if it is not possible to guarantee the order of entries in an object when serializing, but for a stream-based parser, it really isn't making use of a dictionary (ordered or otherwise) when doing the parsing. It is looking at the characters in a sequential method as they come in, rather than waiting for a dictionary to be created and then iterating over the members. In this case, it's most similar to an XML SAX-based parser.

My earlier point about having @subject follow @context was based on this observation: if it is important for a stream-based processor to see @context first, so that it can interpret the data that follows, then it would also seem to be important to see @subject before any key/values that would be have triples generated as a result of that.

If we do not have MUST for the order of @context and @subject within a serialized JSON object, then a processor cannot take advantage of this, and must save everything until the entire object has been processed, and so may as well use a dictionary representation and give up on single-pass processing.

My own processor makes use of a jSON library, and so operates over a hash (dictionary) representation of objects, so it's not particularly important for me, but if I were to implement something based on stream-based processing, I'd definitely like to have the order of elements necessary to ensure efficient processing specified.

> Ivan
> 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
>

Received on Monday, 3 October 2011 16:15:06 UTC