[JSON] Constraining JSON serialization discussion

On 23 Mar 2011, at 19:00, Peter Frederick Patel-Schneider wrote:
> I'm really interested in just what *is* JSON?  Is there a standard?

JSON means many different things based on the context. Here is what the
context for this group should be: JSON - the serialization format.

The serialization format is defined by RFC4627:

http://www.ietf.org/rfc/rfc4627.txt

Constraint #1: The grammar that this WG MUST use is defined in RFC4627

> On 23 Mar 2011, at 19:00, Peter Frederick Patel-Schneider wrote: 
> which is again only a syntax.  Perhaps JSON is only a syntax and
> there is no data model!

It depends on what you mean by "data model", but formally - there is no
defined data model for JSON and people get by just fine without there
being one. It just so happens that JSON maps well to almost every
programming languages native datatypes (associative arrays in most
cases), but the data model is ultimately defined by the language.

Constraint #2: The JSON data model is not defined across all programming
languages, and does not need to be in order to be useful for the work in
this WG.

> On 23 Mar 2011, at 19:00, Peter Frederick Patel-Schneider wrote: Is
> there a notion of round-tripping in JSON?

If you mean: Are there services that output JSON and then expect the
same JSON structure to be posted back to them? Yes.

> { "foo" : 3 , "foo" : 1 , "foo" : 4 , "foo" : 5 , "foo" : 9 }
> 
> is valid JSON.
> 
> Is this correct?

According to RFC4627 it is valid, however many of the programming
languages use associative arrays to store their values, which require
unique keys. We MUST NOT depend on this functionality, it won't work
across all of the popular JSON implementations.

>> Nathan wrote:
>> Isn't the data model simply Javascript objects, as defined in
>> ECMA-262?

No, it's not as simple as that. ECMA is just one of the data models that
JSON can map to.

> informally perhaps, but even just a single boolean value, or a
> number, or a string is valid JSON.

No, this is absolutely not correct! RFC4627 specifically forbids that.

>> Nathan wrote: well we haven't defined if it is or is not :) we
>> could also treat it as
>>> syntax sugar for multiple space separated keys, as with relLists
>>> in HTML rel="key foo bar".
> 
> 
> Is this actually allowed in JSON?  If so, where is it stated?

It is allowed per RFC4627, Section 2.5: Strings

>> http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-262.pdf
>
>> 
Is this really the JSON spec?

There is no such thing as /the/ "JSON spec" - there is RFC4627 and then
there is one document formalizing how code snippets that look very much
like the grammar specified in RFC4627 map to the ECMA object model.
Don't confuse the two - this WG will base all of the serialization
advice off of RFC4627 because it will be far simpler to do so. It MAY
refer to ECMA, but probably only in non-normative sections.

> The reviver, replacer, and space optional arguments appear to be able
> to greatly affect the situation.  Are these also part of the JSON 
> specification?

They are a part of ECMA, and we won't have to ever mention them in the
spec we're creating. If we find that we do, we've screwed up.

> Does the WG have to take them into account?

No.

> Could the WG exploit them?

In general - No. We should not do anything "fancy" or "exploit"ive with
JSON. There is a very high likelyhood that this will rathole the
conversation.

>> Yes,  http://www.ietf.org/rfc/rfc4627.txt  it is "just" a grammar.
>> 
> 
> I note that the character encoding here appears to be different from 
> that in the JavaScript document.

It is - we will have to make the decision on whether to use UTF-8 or
UTF-16. I think we should use UTF-8 because, unless I'm mistaken, that
is what the majority of the documents on the Web use for
JSON/JavaScript. I also need to find data to back this viewpoint up. :)

>> The mapping of JSON into the object model of the parsers language
>> is not specified.
> 
> So then, how can the WG talk about round-tripping, etc., etc.?

The same way that JSON folks talk about round-tripping today. In it's
simplest form, you serialize something to JSON when you receive a GET
and send it out. If you get the same serialization back via a POST - the
two parties have accomplished a round-trip. Conformant implementations
are not supposed to change values between serializing and deserializing
(doubles, integers, booleans, etc.). This is a snag point in some cases
of serializing RDF values to JSON that we'll have to be careful with.
However, we can have that discussion without knowing what the object
model is, or by placing reasonable constraints on the object model if
necessary.

>> While it does say SHOULD, but it is in reality a MUST.
> 
> Except that there are lots of "MUST"s in the document, so one SHOULD
> be able to have non-unique names, if the circumstances warrant.

Yes, but doing this would be incredibly short-sighted of us, for all of
the reasons outlined. Not to mention that no JSON implementation would
do the right thing in this scenario, so even if we were clever - it
wouldn't work in all of the most common languages with JSON parsers.

> To understand JSON this way is extraordinarily difficult and
> expensive, requiring deep knowledge of the innards of EMCAScript.

You are asking questions that require a deep knowledge of the innards of
ECMAScript, or a few months of JavaScript programming experience.

At this point in your responses you get increasingly wary of JSON
because you're attempting to learn JavaScript simultaneously. You are
looking at a programming language specification in order to understand
/something/. I don't quite know what you're attempting to understand,
but I think you should stop looking at the ECMA specification and start
asking more concrete questions. We could spend a month discussing why
parts of the spec are written the way that they are, or how certain
scenarios don't make sense if you hold a certain world view. I have a
feeling that most of that is not going to help this group come to grips
with the serialization aspect. You may learn a great deal of things that
are not applicable to what we're attempting to accomplish here.

> Well, I, for one, find it hard to work on standardizing against
> anything when I don't know the target.

The target serialization format is RFC4627. ECMA absolutely SHOULD NOT
be the target, but should inform our direction. ECMA is just ONE example
of how JSON works with a programming language's data model. There are
additional ones for Python, Ruby, C++, C, Haskell, Java, PHP, Perl, Lua,
Clojure and many other languages and data models.

>> Yep, I believe most JavaScript JSON parsers rely on the browser to
>> "Do
>>> the right thing", which they do.
> 
> But what *is* the right thing?  (I'm not opposed to "the right thing"
> to be in some extra document, but it sure would be nice to have such
> a document, and have the WG agree on it.)

There is no document that specifies what "the right thing" is because
that document would have to cover all programming languages. JSON has
done just fine without this document. I assert that we will do just fine
without it. If there is a case where something needs to be defined, such
as a constraint placed on all programming languages, we can put it in
our spec.

>> A JSON object that is parsed into a language is -likely- to be
>>> serialized back out the same way.
> 
> Hmm.  I expect that most JSON objects will reserialize as a quite 
> different sequence of characters, even ignoring white space.

The person that responded to you is not correct. JSON objects often do
re-serialize as different sequences of characters. That, however, does
not mean that the data that they represent changes - often it does not.
There are exceptions, like PHP's annoying backslash-escaping - but even
in that case, I don't believe that the data represented changes.

>> Exactly what it looks like while in
>>> the language isn't part of JSON, but is part of JavaScript.
> However, the WG is supposed to be relating the RDF data model (i.e., 
> graphs, or whatever) to JSON, to it sure would be nice to have some 
> reference for what data structure corresponds to some JSON text.

I think that's the wrong way to go about writing this specification.
Stating "This is how the RDF data model is represented in JSON" is
problematic. Stating "This following JSON will result in the following
triples" is much better. In other words - approach the problem from the
other direction and writing the specification becomes much easier and
also makes the language far more flexible.

> YAML

Forget that YAML was even mentioned, it's a rathole.

> Well, we are already in what appear to be the corner cases:

These are not corner cases as far as the JSON serialization grammar is
concerned:

> - colons in names

Allowed per RFC4627.

> - multiple values for properties

Allowed per RFC4627, but all popular implementations don't support this
feature.

> - spaces in names indicating multiple properties

Allowed per RFC4627.

> - URIs as names

Allowed per RFC4627.

I hope all of those responses answer your questions in a definitive way.
I tried to be thorough and exact without getting into too many of the
gory details. Let me know if you have any follow-up questions.

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: Payment Standards and Competition
http://digitalbazaar.com/2011/02/28/payment-standards/

Received on Friday, 25 March 2011 01:36:07 UTC