RE: datatype coercion issues from Markus Lanthaler on 2012-03-29 (public-linked-json@w3.org from March 2012)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Thu, 29 Mar 2012 19:44:47 +0800
To: "'Gregg Kellogg'" <gregg@kellogg-assoc.com>, "'Dave Longley'" <dlongley@digitalbazaar.com>
Cc: <public-linked-json@w3.org>
Message-ID: <00fd01cd0da1$5c17c5d0$14475170$@lanthaler@gmx.net>
On Wednesday, March 28, 2012 7:47 AM, Gregg Kellogg wrote: 

> On Mar 27, 2012, at 10:59 AM, Dave Longley wrote:
> 
> > One idea is to treat datatypes as opaque values unless a special
> > "primitive flag" is provided to compaction or expansion. The flag has
> > three settings, which are: off, convert all natives to xsd type
> @values,
> > and convert all xsd type @values to natives. This would mean:
> 
> Not a big fan of such flags, but I'm open to consider it.

I'm not a big fan of such flags either. I think to address this issue we
should look at compaction and expansion from a different point of view than
normalization and RDF round-tripping.

Let me start with expansion (as that's the base for all algorithms). I think
every value should stay in its native form, no automatic type conversions
are done at all. If there's a coercion, the value just gets expanded into
the expanded object form (@value) to not lose any information.

The thing would be normalization - and there it gets tricky. I think the
only problem we have there is numbers with fractions (aka doubles). Although
I'm not really sure that we really have an issue there (at least not sure
one we can solve).

The problem is that JSON doesn't specify the value space at all and since
everything is a string on the wire there are no rounding issues *till you
parse it*. And there lies the problem in my opinion. We can define any
number format (%1.16E, 1.15E) but still we have no control over the result.
The value could already haven been changed during parsing and we don't
define the parsing at all but rely on existing implementations. So I think,
the only sensible way is to use the parsers built-in to-string conversion
without applying anything else. That would at least make sure that you won't
see different values within your own systems. The problem is that this would
render normalization useless - I think. But at least it would provide good
RDF-tripping.

I don't have a proposal or even solution for this normalization issue. So,
ignore the normalization issue for a moment, I would propose the following
for conversion to RDF:

- if a value is not coerced, it will stay in its native form during
expansion

- if a value is coerced, it will still stay in its native form during
expansion but be put in a @value-object

- if a value is converted to RDF (normalized??), the following happens:


Native Numbers:
  - if value is not coerced and has no fractions, or is coerced to
xsd:integer, it will be converted to an integer string (you'll lose the
fractions if you coerce a double to int)

  - if value is not coerced and has fractions, or is coerced to xsd:double,
it will be converted to converted to an double string

  - any other coercion, we relies on the parsers-to-string method to create
a string


Booleans:
  - if value is not coerced it will be coerced to xsd:double and transformed
to string

  - if coerced to anything other than xsd:double, the value will still be
transformed to a string, no type checking


And from RDF would do the following (note, not compaction):
  - if xsd:integer, convert to native number, keep string (with @type) if
conversion fails

  - if xsd:double, convert to native number, keep string (with @type) if
conversion fails

  - if xsd:boolean, convert to native boolean, if not "true" or "false",
keep string (with @type)


Compaction could then be simplified to to do the following:

  - if @type matches the type in the term definition, convert @value object
to scalar value



I think that would generally work and should be very predictable. The only
open question is whether we rely on some specific number to string
conversion or if we should leave that to the JSON parser. I'm fine with
either as it really just affects RDF round tripping (and *maybe*
normalization - not sure if we can solve that issue at all).

What do you think? Does this make sense? Did I forget something?


--
Markus Lanthaler
@markuslanthaler
Received on Thursday, 29 March 2012 11:45:30 UTC