- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Fri, 3 Nov 2023 11:33:48 -0400
- To: Antoine Zimmermann <antoine.zimmermann@emse.fr>, RDF-star Working Group <public-rdf-star-wg@w3.org>
On 11/3/23 05:11, Antoine Zimmermann wrote: > Peter, > > The way you put it is strange: you first say that all JSON should be allowed, > then immediately show that it would be a problem. > > So, can you explain why it *should* be so? > > --AZ The short version is that disallowing parts of JSON is worse than allowing all of JSON but that allowing all of JSON has difficulties that need to be addressed. For a longer version see below. Disclaimer: The following is all my understanding of JSON, Javascript, Unicode, and the various ECMA and RDF documents on them. I've spent a lot of time trying to make sense of all of them but I don't consider myself a full expert. The basic idea underlying JSON is quite simple. JSON allows computer-language-indpendent transfer of values, which in turn are either strings, numbers, arrays, or objects. The problems arise when one actually tries to use JSON. What are JSON strings? What are JSON numbers? What are JSON objects? JSON documentation by and large leads one to believe that the answers are all idealistic - JSON strings are Unicode strings, JSON numbers are not limited in range or precision, JSON arrays are sequences of values, and JSON objects are bags (or, more likely, sets) of string-value pairs. But the initial uses of JSON were in Javascript - JSON is the the JavaScript Object Notation after all. So the historical answers are different - JSON strings are supposed to be Javascript strings, JSON numbers are supposed to be Javascript numbers, JSON objects are supposed to be Javascript objects. Looking further into Javascript ends up with the following. JSON strings are supposed to be finite sequences of UTF-16 code units. JSON numbers are supposed to be IEEE floating point double with the lexical-to-value mapping as in Javascript. JSON objects are supposed to be finite maps from JSON strings to JSON values. This state of affairs has been codified by RFC 8785 JSON Canonicalization Scheme (JCS) https://www.rfc-editor.org/rfc/rfc8785, which itself depends on The I-JSON Message Format https://www.rfc-editor.org/rfc/rfc7493. I-JSON is a syntactic restriction of JSON that forbids duplicate names in objects, prohibits both Unicode surrogate code points and Unicode noncharacters, and suggests that numbers be restricted to those that map nicely onto IEEE floating point double. So I-JSON prohibits "\uDEAD" and {"a": 1, "a": 2} and suggests not using 3.141592653589793238462643383279 and 1E1000 but allows "\uD800\uDEAD" Why is the last allowed? Because Javascript strings are UTF-16 code units and JSON string escapes only allow 16-bit escapes so many Unicode characters have to be escaped as pairs of Unicode surrogate characters. So RFC 8785 provides the basis of an RDF dataype for I-JSON. In essence one takes I-JSON syntax, processes it as it would be processed in Javascript, and outputs it as it would be printed in Javascript. OK so far, except that there are some peculiarities that expose the fact that Javascript uses UTF-16 internally. These are (only) (very) annoying. But there are two problems. First, it is unclear what actually counts as I-JSON. Should all JSON numbers be allowed? Or only JSON numbers that nicely map into IEEE floating point double? Second, what about the prohibited parts of JSON? It is not so hard to do the same thing that RFC 8785 does except for all of JSON. One could just say "do as Javascript does" or one could extend RFC 8785 by saying that for repeated names in object the last one is taken, JSON strings are sequences of UTF-16 code units, and all JSON numbers are allowed. There is also the question of what the value space for rdf:JSON is. JSON-LD uses strings but it seems to me that the value space for rdf:JSON should be the data that at JSON text encodes, i.e., a recursively defined datatype such as Definition: The value space for rdf:JSON is recursively defined as finite sequences of Unicode UTF-16 code units, IEEE floating point numbers excluding infinities and not-a-numbers, finite sequences of elements of the value space for rdf:JSON, or finite mappings from finite sequences of Unicode UTF-16 code units to elements of the value space for rdf:JSON. or Definition: The value space for rdf:JSON is recursively defined as finite sequences of Unicode code points, IEEE floating point numbers excluding infinities and not-a-numbers, finite sequences of elements of the value space for rdf:JSON, or finite mappings from Unicode code points to elements of the value space for rdf:JSON. or Definition: The value space for rdf:JSON is recursively defined as finite sequences of Unicode code points, IEEE floating point numbers, finite sequences of elements of the value space for rdf:JSON, or sets of pairs of Unicode code points and elements of the value space for rdf:JSON. peter
Received on Friday, 3 November 2023 15:33:55 UTC