Re: datatype coercion issues

One idea is to treat datatypes as opaque values unless a special 
"primitive flag" is provided to compaction or expansion. The flag has 
three settings, which are: off, convert all natives to xsd type @values, 
and convert all xsd type @values to natives. This would mean:

When the primitive flag is off (this is the default behavior):
*Note: Remember that expansion always occurs before compaction in the 
compaction algorithm.

During expansion, if a value is already in expanded form (@value), it is 
left alone. If a value is a native JSON string, number, or boolean, then 
the value becomes a @value with a @type if there is a coercion rule. How 
a native double is converted to a string could be left unspecified or we 
could use %1.16e. If there is no coercion rule, the value is left alone; 
it remains its native type.

During compaction any value with a coercion rule becomes a string. No 
attempt is made to understand any xsd types at all; they are treated as 
opaque just like any other datatype. If there is no coercion rule, the 
value is left alone; it remains its native type.

When the primitive flag is set to convert natives to xsd type @values:

During expansion, any native JSON number or boolean with a coercion rule 
is treated the same way as when the primitive flag is off. If the value 
has no coercion rule, then a JSON number that contains a decimal point 
will be treated as if it had an xsd double coercion rule, a JSON number 
without a decimal point will be treated as if it had an xsd integer 
coercion rule, and a JSON boolean will be treated as if it had an xsd 
boolean coercion rule.

During compaction, all values with coercion rules become strings. In 
other words, the same as when the primitive flag is off.

When the primitive flag is set to convert xsd type @values to natives:

During both expansion and compaction, any value with a @type or coercion 
rule that is an xsd type of integer, boolean, or double is converted to 
its corresponding native JSON type. If the value is not within the 
lexical space of the given xsd type, then some rules are used to convert 
it. If the type is xsd integer or double, and the value is a boolean, 
then true is 1 and false is 0; if it is a string, then the initial part 
of the string that is an integer or a double will be the resulting value 
where 0 is used if the initial string contains no digits. If the type is 
xsd boolean, a value of "false", "0", or 0 will be considered false, and 
anything else will be considered true.

Any other values expand and compact the same way that they do when the 
primitive flag is off.

This approach allows us to treat xsd types like any other type by 
default (as opaque), but gives us the option to treat them differently 
in order to convert to/from native types. Also, it treats native type 
conversion as an orthogonal issue to expansion/compaction.

Another minor tweak to this idea would be to allow more a fine grained 
setting of the primitive flag for each type of native.

Thoughts?


On 03/27/2012 11:52 AM, Gregg Kellogg wrote:
> On the call today, we spent a lot of time discussing issues 87 [1] and 81 [2], relating to coercion and round tripping. There are basically several things that come out of this, for which we probably need separate issues:
>
> What is the range of the coercion operator in JSON-LD? As indicated by issue 87, it is any value (not an object or an array). This would include boolean and numeric, in addition to string. One possibility is limiting this to string, or doing it on a case-by-case basis. (boolean could coerce numeric types based on 0 or not 0, integer or double could coerce boolean or other numeric).
>
> When is coercion applied? If applied in expansion, this implies that every term having a coercion rule with an appropriate value is placed in @value form. We currently say that native types are not converted, but we contradict ourselves for xsd:double.
>
> Do strings not having the lexical form of a coerced datatype have coercion applied? For example does "foo" coerced to a boolean result in "foo"^^xsd:boolean, or just "foo".
>
> For some other examples, consider the following:
>
> {
>    "@context": {
>      "xsd": "http://www.w3.org/2001/XMLSchema#",
>      "boolean": {"@id": "xsd:boolean", "@type": "xsd:boolean"},
>      "integer": {"@id": "xsd:integer", "@type": "xsd:integer"},
>      "double": {"@id": "xsd:double", "@type": "double"},
>      "date": {"@id": "xsd:date", "@type": "date"}
>    },
>    "boolean": [true, "false", 1, "0", 5, "5", 2.5, "2.5E0", "2011-03-27", "2011-03-27T01:23:45"],
>    "integer": [true, "false", 1, "0", 5, "5", 2.5, "2.5E0", "2011-03-27", "2011-03-27T01:23:45"],
>    "double": [true, "false", 1, "0", 5, "5", 2.5, "2.5E0", "2011-03-27", "2011-03-27T01:23:45"],
>    "date": [true, "false", 1, "0", 5, "5", 2.5, "2.5E0", "2011-03-27", "2011-03-27T01:23:45"]
> }
>
> My implementation currently results in the following (although I don't necessarily agree with all of these conversions):
>
> @prefix xsd:<http://www.w3.org/2001/XMLSchema#>  .
> [ xsd:boolean "true"^^xsd:boolean,
>                "false"^^xsd:boolean,
>                "5"^^xsd:boolean,
>                "2.5"^^xsd:boolean,
>                "2.5E0"^^xsd:boolean,
>                "2011-03-27"^^xsd:boolean,
>                "2011-03-27T01:23:45"^^xsd:boolean;
>    xsd:integer "true"^^xsd:boolean,
>                "false"^^xsd:integer,
>                "1"^^xsd:integer,
>                "0"^^xsd:integer,
>                "5"^^xsd:integer,
>                "2"^^xsd:integer,
>                "2.5E0"^^xsd:integer,
>                "2011-03-27"^^xsd:integer,
>                "2011-03-27T01:23:45"^^xsd:integer;
>    xsd:double "true"^^xsd:double,
>                "false"^^xsd:double,
>                "1.0E0"^^xsd:double,
>                "0"^^xsd:double,
>                "5.0E0"^^xsd:double,
>                "2.5E0"^^xsd:double;
>    xsd:date    "true"^^xsd:boolean,
>                "false"^^xsd:date,
>                "1"^^xsd:date,
>                "0"^^xsd:date,
>                "5"^^xsd:date,
>                "2.5"^^xsd:date,
>                "2.5E0"^^xsd:date,
>                "2011-03-27"^^xsd:date,
>                "2011-03-27T01:23:45"^^xsd:date
> ] .
>
> We could also only perform coercion when the lexical form of the representation matches the XSD definition, although this would be at odds with use in RDF parses, such as Turtle. We currently say that native representations, whether coerced or not, remain in their original form, although xsd:double currently contradicts that, always coercing any value to an @value representation using "1.16E"
>
> When compacting, when can data-typed @value representations be turned into native form? One possible solution would be to convert anything to native form where the lexical representation in @value matches that of the associated XSD definition.
>
> Is 1.16E preserving of 64-bit doubles? In my Ruby implementation it produces rounding errors:
>
> "%1.16E" % 5.2 =>  "5.2000000000000002E+00"
> "%1.16E" % 5.3 =>  "5.2999999999999998E+00"
>
> %1.15E does not result in rounding errors. It would be useful for others to check their implementations.
>
> Gregg
>
> [1] https://github.com/json-ld/json-ld.org/issues/87
> [2] https://github.com/json-ld/json-ld.org/issues/81


-- 
Dave Longley
CTO
Digital Bazaar, Inc.

Received on Tuesday, 27 March 2012 17:59:57 UTC