Re: [JSON] Constraining JSON serialization discussion from Peter Frederick Patel-Schneider on 2011-03-25 (public-rdf-wg@w3.org from March 2011)

From: Peter Frederick Patel-Schneider <pfps@research.bell-labs.com>
Date: Thu, 24 Mar 2011 23:19:17 -0400
To: <msporny@digitalbazaar.com>
CC: <public-rdf-wg@w3.org>
Message-ID: <20110324.231917.2033001889204246322.pfps@research.bell-labs.com>
From: Manu Sporny <msporny@digitalbazaar.com>
Subject: [JSON] Constraining JSON serialization discussion
Date: Thu, 24 Mar 2011 20:35:34 -0500

> On 23 Mar 2011, at 19:00, Peter Frederick Patel-Schneider wrote:
>> I'm really interested in just what *is* JSON?  Is there a standard?
> 
> JSON means many different things based on the context. Here is what the
> context for this group should be: JSON - the serialization format.
> 
> The serialization format is defined by RFC4627:
> 
> http://www.ietf.org/rfc/rfc4627.txt
> 
> Constraint #1: The grammar that this WG MUST use is defined in RFC4627

OK.  Your opinion so far, but I think that you are right.
Let's get a WG decision on this ASAP!

>> On 23 Mar 2011, at 19:00, Peter Frederick Patel-Schneider wrote: 
>> which is again only a syntax.  Perhaps JSON is only a syntax and
>> there is no data model!
> 
> It depends on what you mean by "data model", but formally - there is no
> defined data model for JSON and people get by just fine without there
> being one. It just so happens that JSON maps well to almost every
> programming languages native datatypes (associative arrays in most
> cases), but the data model is ultimately defined by the language.
> 
> Constraint #2: The JSON data model is not defined across all programming
> languages, and does not need to be in order to be useful for the work in
> this WG.

It is not OK to me that the WG does not have a good written-down notion
of what some piece of JSON means.  This doesn't have to be anything
fancy, by the way, but I remain astonished that there is not some
generally-agreed-on language-independent notion of what JSON is supposed
to map to.   

>> On 23 Mar 2011, at 19:00, Peter Frederick Patel-Schneider wrote: Is
>> there a notion of round-tripping in JSON?
> 
> If you mean: Are there services that output JSON and then expect the
> same JSON structure to be posted back to them? Yes.
> 
>> { "foo" : 3 , "foo" : 1 , "foo" : 4 , "foo" : 5 , "foo" : 9 }
>> 
>> is valid JSON.
>> 
>> Is this correct?
> 
> According to RFC4627 it is valid, however many of the programming
> languages use associative arrays to store their values, which require
> unique keys. We MUST NOT depend on this functionality, it won't work
> across all of the popular JSON implementations.

Is the WG supposed to be doing something that works across all of the
popular JSON implementations?  What are they?  Should the WG
investigate to see what data model these implementations impose on JSON? 

>>> Nathan wrote:
>>> Isn't the data model simply Javascript objects, as defined in
>>> ECMA-262?
> 
> No, it's not as simple as that. ECMA is just one of the data models that
> JSON can map to.
>
>> informally perhaps, but even just a single boolean value, or a
>> number, or a string is valid JSON.
> 
> No, this is absolutely not correct! RFC4627 specifically forbids that.
> 
>>> Nathan wrote: well we haven't defined if it is or is not :) we
>>> could also treat it as
>>>> syntax sugar for multiple space separated keys, as with relLists
>>>> in HTML rel="key foo bar".
>> 
>> 
>> Is this actually allowed in JSON?  If so, where is it stated?
> 
> It is allowed per RFC4627, Section 2.5: Strings

Sure, spaces are allowed in strings.  But do the popular JSON
implementations treat spaces in names as a shorthand?  This would be a
very big surprise to me!  What other surprises are there?  How can
anyone tell what should be considered to be a surprise?

>>> http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-262.pdf
>>
>>> 
> Is this really the JSON spec?
> 
> There is no such thing as /the/ "JSON spec" - there is RFC4627 and then
> there is one document formalizing how code snippets that look very much
> like the grammar specified in RFC4627 map to the ECMA object model.
> Don't confuse the two - this WG will base all of the serialization
> advice off of RFC4627 because it will be far simpler to do so. It MAY
> refer to ECMA, but probably only in non-normative sections.
> 
>> The reviver, replacer, and space optional arguments appear to be able
>> to greatly affect the situation.  Are these also part of the JSON 
>> specification?
> 
> They are a part of ECMA, and we won't have to ever mention them in the
> spec we're creating. If we find that we do, we've screwed up.
>
>> Does the WG have to take them into account?
> 
> No.

But they are part of a popular JSON implementation, so shouldn't the WG
take them into account?

>> Could the WG exploit them?
> 
> In general - No. We should not do anything "fancy" or "exploit"ive with
> JSON. There is a very high likelyhood that this will rathole the
> conversation.

But what is fancy and should be avoided?  I was somewhat suprised that
colons in names might be considered to be fancy!

>>> Yes,  http://www.ietf.org/rfc/rfc4627.txt  it is "just" a grammar.
>>> 
>> 
>> I note that the character encoding here appears to be different from 
>> that in the JavaScript document.
> 
> It is - we will have to make the decision on whether to use UTF-8 or
> UTF-16. I think we should use UTF-8 because, unless I'm mistaken, that
> is what the majority of the documents on the Web use for
> JSON/JavaScript. I also need to find data to back this viewpoint up. :)
> 
>>> The mapping of JSON into the object model of the parsers language
>>> is not specified.
>> 
>> So then, how can the WG talk about round-tripping, etc., etc.?
> 
> The same way that JSON folks talk about round-tripping today. In it's
> simplest form, you serialize something to JSON when you receive a GET
> and send it out. If you get the same serialization back via a POST - the
> two parties have accomplished a round-trip. 

How often is this (very strong) notion of successful round-tripping
achieved?  This requires outputing the same whitespace and the same
order of object members.

> Conformant implementations
> are not supposed to change values between serializing and deserializing
> (doubles, integers, booleans, etc.). 

Where did doubles and integers come into the mix?  I thought that JSON
only had a single generic numeric, and didn't even have boolean as a
syntactic category.

> This is a snag point in some cases
> of serializing RDF values to JSON that we'll have to be careful with.
> However, we can have that discussion without knowing what the object
> model is, or by placing reasonable constraints on the object model if
> necessary.

OK, but what are these "reasonable" constraints?

>>> While it does say SHOULD, but it is in reality a MUST.
>> 
>> Except that there are lots of "MUST"s in the document, so one SHOULD
>> be able to have non-unique names, if the circumstances warrant.
> 
> Yes, but doing this would be incredibly short-sighted of us, for all of
> the reasons outlined. Not to mention that no JSON implementation would
> do the right thing in this scenario, so even if we were clever - it
> wouldn't work in all of the most common languages with JSON parsers.

Again, what is our target here?

>> To understand JSON this way is extraordinarily difficult and
>> expensive, requiring deep knowledge of the innards of EMCAScript.
> 
> You are asking questions that require a deep knowledge of the innards of
> ECMAScript, or a few months of JavaScript programming experience.

No, or at least I hope not.  I thought that I was asking what the
meaning of JSON was.  

> At this point in your responses you get increasingly wary of JSON
> because you're attempting to learn JavaScript simultaneously. 

Only because that's were the documents lead to.  I don't care (in this
context, at least) about JavaScript.

> You are
> looking at a programming language specification in order to understand
> /something/. I don't quite know what you're attempting to understand,

JSON, no more than that!

> but I think you should stop looking at the ECMA specification 

Fine ...

> and start
> asking more concrete questions. 

... but this is where following my nose got me.

> We could spend a month discussing why
> parts of the spec are written the way that they are, or how certain
> scenarios don't make sense if you hold a certain world view. I have a
> feeling that most of that is not going to help this group come to grips
> with the serialization aspect. You may learn a great deal of things that
> are not applicable to what we're attempting to accomplish here.

Well, here is where we part company.  I feel that I need to know what
JSON syntactic structures are supposed to mean.  I asked a few
questions, was pointed to the EMCA spec, and then ended up in the depths
of EMCAScript parsing.  I'm certainly willing not to have to look at
this document, but what else is there?

>> Well, I, for one, find it hard to work on standardizing against
>> anything when I don't know the target.
> 
> The target serialization format is RFC4627. ECMA absolutely SHOULD NOT
> be the target, but should inform our direction. 

So then I should understand it!

> ECMA is just ONE example
> of how JSON works with a programming language's data model. There are
> additional ones for Python, Ruby, C++, C, Haskell, Java, PHP, Perl, Lua,
> Clojure and many other languages and data models.

Is there any commonality between them?  If so, what is it?  If not, then
what can the WG do?

>>> Yep, I believe most JavaScript JSON parsers rely on the browser to
>>> "Do the right thing", which they do.
>> 
>> But what *is* the right thing?  (I'm not opposed to "the right thing"
>> to be in some extra document, but it sure would be nice to have such
>> a document, and have the WG agree on it.)
> 
> There is no document that specifies what "the right thing" is because
> that document would have to cover all programming languages. JSON has
> done just fine without this document. I assert that we will do just fine
> without it. If there is a case where something needs to be defined, such
> as a constraint placed on all programming languages, we can put it in
> our spec.

JSON may be doing fine, probably because there is some perceived common
perception of its reality.  But what is this common perception?  How can
the WG (not all being JSON devotees) move forward without knowing it?

>>> A JSON object that is parsed into a language is -likely- to be
>>>> serialized back out the same way.
>> 
>> Hmm.  I expect that most JSON objects will reserialize as a quite 
>> different sequence of characters, even ignoring white space.
> 
> The person that responded to you is not correct. JSON objects often do
> re-serialize as different sequences of characters. That, however, does
> not mean that the data that they represent changes - often it does not.
Often?  This sounds very scary to me.  I would be much happer with
"almost always" and even happier with "always except ...".
> There are exceptions, like PHP's annoying backslash-escaping - but even
> in that case, I don't believe that the data represented changes.

>>> Exactly what it looks like while in
>>>> the language isn't part of JSON, but is part of JavaScript.
>> However, the WG is supposed to be relating the RDF data model (i.e., 
>> graphs, or whatever) to JSON, to it sure would be nice to have some 
>> reference for what data structure corresponds to some JSON text.
> 
> I think that's the wrong way to go about writing this specification.
> Stating "This is how the RDF data model is represented in JSON" is
> problematic. Stating "This following JSON will result in the following
> triples" is much better. In other words - approach the problem from the
> other direction and writing the specification becomes much easier and
> also makes the language far more flexible.

See my previous proposal.

>> YAML
> 
> Forget that YAML was even mentioned, it's a rathole.
> 
>> Well, we are already in what appear to be the corner cases:
> 
> These are not corner cases as far as the JSON serialization grammar is
> concerned:
> 
>> - colons in names
> 
> Allowed per RFC4627.

But claimed to be problematic because of dot notation.

>> - multiple values for properties
> 
> Allowed per RFC4627, but all popular implementations don't support this
> feature.

OK, problematic.  What else is problematic?

>> - spaces in names indicating multiple properties
> 
> Allowed per RFC4627.

RFC doesn't say anything about this that I can find.

>> - URIs as names
> 
> Allowed per RFC4627.

OK, but also claimed to be problematic.

> I hope all of those responses answer your questions in a definitive way.
> I tried to be thorough and exact without getting into too many of the
> gory details. Let me know if you have any follow-up questions.

See above.  :-)

> -- manu

What I think is really needed for the WG to proceed much further is at
least initial drafts for:
1/ effective syntax for JSON (RFC4627 gives the reference syntax, but
   what are the problematic parts of this syntax)
2/ some notion of the meaning of JSON 

peter
Received on Friday, 25 March 2011 03:20:04 UTC