Re: Stream-based processing!? from Ivan Herman on 2011-10-04 (public-linked-json@w3.org from October 2011)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 4 Oct 2011 10:06:10 +0200
To: Gregg Kellogg <gregg@kellogg-assoc.com>
Cc: Markus Lanthaler <markus.lanthaler@gmx.net>, "public-linked-json@w3.org" <public-linked-json@w3.org>
Message-Id: <58C8F5AD-8E92-4702-9CBA-4E8450EAC342@w3.org>
On Oct 3, 2011, at 18:14 , Gregg Kellogg wrote:

> On Oct 3, 2011, at 4:01 AM, Ivan Herman wrote:
> 
>> 
>> On Oct 2, 2011, at 22:33 , Markus Lanthaler wrote:
>>> 
>>> 
>>>>> We could also require serializations ensure that @context is listed
>>>>> first. If it isn't listed first, the processor has to save each
>>>>> key-value pair until the @context is processed. This creates a memory
>>>>> and complexity burden for one-pass processors.
>>> 
>>> Agree. I think that would make a lot of sense since you can see the context
>>> as a kind of header anyway.
>> 
>> I must admit I do not really understand that, but that probably shows my ignorance of the wider JSON world.
>> 
>> However... the standard JSON parser in Python parses a JSON object into a dictionary. However, at least in Python, you cannot rely on the order of the keys within the dictionary (it is determined by some hashing algorithm, if I am not mistaken, but that is internal to the interpreter anyway). Ie, whether @context appears first or last does not make any difference. 
>> 
>> Worse: if you then use such a structure to generate JSON using again the 'dump' feature of the standard Python parser, there is no way to control the order of those keys. In other words, if we impose such an order in JSON-LD, that means that a Python programmer must bypass the standard JSON library module and do the dump by hand. I do not think that would be acceptable...
> 
> I think we're conflating the in-memory representation of an object from the serialized form of it. This may be an issue if it is not possible to guarantee the order of entries in an object when serializing, but for a stream-based parser, it really isn't making use of a dictionary (ordered or otherwise) when doing the parsing. It is looking at the characters in a sequential method as they come in, rather than waiting for a dictionary to be created and then iterating over the members. In this case, it's most similar to an XML SAX-based parser.
> 
> My earlier point about having @subject follow @context was based on this observation: if it is important for a stream-based processor to see @context first, so that it can interpret the data that follows, then it would also seem to be important to see @subject before any key/values that would be have triples generated as a result of that.
> 
> If we do not have MUST for the order of @context and @subject within a serialized JSON object, then a processor cannot take advantage of this, and must save everything until the entire object has been processed, and so may as well use a dictionary representation and give up on single-pass processing.
> 
> My own processor makes use of a jSON library, and so operates over a hash (dictionary) representation of objects, so it's not particularly important for me, but if I were to implement something based on stream-based processing, I'd definitely like to have the order of elements necessary to ensure efficient processing specified.
> 

And I fully understand that. But, based on the finding on the RFC[1] the choice we have is whether JSON-LD would explicitly require a JSON feature that is in contradiction with the official JSON RFC spec.

The question is really of practice, and we may have to look around for feedback. Ie: how easy is it, for different JSON libraries out there, to ensure order in parsing and, more importantly, for the production of JSON? Or do we require JSON-LD programmers, data producers, etc, to use non-standard tools? In other words, does the JSON community ignore the RFC spec and, in effect, uses ordering?

Ivan



[1] http://lists.w3.org/Archives/Public/public-linked-json/2011Oct/0034.html




>> Ivan
>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Tuesday, 4 October 2011 08:05:10 UTC