Re: rdf:JSON

On 11/3/23 05:11, Antoine Zimmermann wrote:
> Peter,
> 
> The way you put it is strange: you first say that all JSON should be allowed, 
> then immediately show that it would be a problem.
> 
> So, can you explain why it *should* be so?
> 
> --AZ
The short version is that disallowing parts of JSON is worse than allowing all 
of JSON but that allowing all of JSON has difficulties that need to be 
addressed.  For a longer version see below.



Disclaimer:  The following is all my understanding of JSON, Javascript, 
Unicode, and the various ECMA and RDF documents on them.  I've spent a lot of 
time trying to make sense of all of them but I don't consider myself a full 
expert.


The basic idea underlying JSON is quite simple.  JSON allows 
computer-language-indpendent transfer of values, which in turn are either 
strings, numbers, arrays, or objects.

The problems arise when one actually tries to use JSON.  What are JSON 
strings? What are JSON numbers?  What are JSON objects?  JSON documentation by 
and large leads one to believe that the answers are all idealistic - JSON 
strings are Unicode strings, JSON numbers are not limited in range or 
precision, JSON arrays are sequences of values, and JSON objects are bags (or, 
more likely, sets) of string-value pairs.

But the initial uses of JSON were in Javascript - JSON is the the JavaScript 
Object Notation after all.  So the historical answers are different - JSON 
strings are supposed to be Javascript strings, JSON numbers are supposed to be 
Javascript numbers, JSON objects are supposed to be Javascript objects.

Looking further into Javascript ends up with the following.  JSON strings are 
supposed to be finite sequences of UTF-16 code units.  JSON numbers are 
supposed to be IEEE floating point double with the lexical-to-value mapping as 
in Javascript.  JSON objects are supposed to be finite maps from JSON strings 
to JSON values.


This state of affairs has been codified by RFC 8785 JSON Canonicalization 
Scheme (JCS) https://www.rfc-editor.org/rfc/rfc8785, which itself depends on 
The I-JSON Message Format https://www.rfc-editor.org/rfc/rfc7493.  I-JSON is a 
syntactic restriction of JSON that forbids duplicate names in objects, 
prohibits both Unicode surrogate code points and Unicode noncharacters, and 
suggests that numbers be restricted to those that map nicely onto IEEE 
floating point double.

So I-JSON prohibits

"\uDEAD"

and

{"a": 1, "a": 2}

and suggests not using

3.141592653589793238462643383279

and

1E1000

but allows

"\uD800\uDEAD"

Why is the last allowed?  Because Javascript strings are UTF-16 code units and 
JSON string escapes only allow 16-bit escapes so many Unicode characters have 
to be escaped as pairs of Unicode surrogate characters.

So RFC 8785 provides the basis of an RDF dataype for I-JSON.  In essence one 
takes I-JSON syntax, processes it as it would be processed in Javascript, and 
outputs it as it would be printed in Javascript.  OK so far, except that there 
are some peculiarities that expose the fact that Javascript uses UTF-16 
internally.  These are (only) (very) annoying.

But there are two problems.  First, it is unclear what actually counts as 
I-JSON.  Should all JSON numbers be allowed?  Or only JSON numbers that nicely 
map into IEEE floating point double?  Second, what about the prohibited parts 
of JSON?

It is not so hard to do the same thing that RFC 8785 does except for all of 
JSON.  One could just say "do as Javascript does" or one could extend RFC 8785 
by saying that for repeated names in object the last one is taken, JSON 
strings are sequences of UTF-16 code units, and all JSON numbers are allowed.


There is also the question of what the value space for rdf:JSON is.   JSON-LD 
uses strings but it seems to me that the value space for rdf:JSON should be 
the data that at JSON text encodes, i.e., a recursively defined datatype such as

Definition:  The value space for rdf:JSON is recursively defined as finite 
sequences of Unicode UTF-16 code units, IEEE floating point numbers excluding 
infinities and not-a-numbers, finite sequences of elements of the value space 
for rdf:JSON, or finite mappings from finite sequences of Unicode UTF-16 code 
units to elements of the value space for rdf:JSON.

or

Definition:  The value space for rdf:JSON is recursively defined as finite 
sequences of Unicode code points, IEEE floating point numbers excluding 
infinities and not-a-numbers, finite sequences of elements of the value space 
for rdf:JSON, or finite mappings from Unicode code points to elements of the 
value space for rdf:JSON.

or

Definition:  The value space for rdf:JSON is recursively defined as finite 
sequences of Unicode code points, IEEE floating point numbers, finite 
sequences of elements of the value space for rdf:JSON, or sets of pairs of 
Unicode code points and elements of the value space for rdf:JSON.


peter

Received on Friday, 3 November 2023 15:33:55 UTC