Re: Use of XSD namespace in RDF recommendations from Gregg Kellogg on 2012-09-04 (public-rdf-comments@w3.org from September 2012)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Tue, 4 Sep 2012 18:54:14 -0400
To: Richard Cyganiak <richard@cyganiak.de>
CC: public-rdf-comments Comments <public-rdf-comments@w3.org>
Message-ID: <E4EA4CD6-956A-4DEF-A73A-207BC37E77AC@greggkellogg.net>
On Sep 4, 2012, at 2:55 PM, Richard Cyganiak <richard@cyganiak.de> wrote:

> 
> On 4 Sep 2012, at 22:18, Gregg Kellogg wrote:
>>> I guess I like the idea of informatively linking to both the 2006 SWBP Note on datatypes [1] and to the OWL 2 datatype definition mechanism [2], stating that both XML Schema and OWL 2 provide facilities for formally defining RDF datatypes, but that support for neither mechanism is required for RDF.
>> 
>> Perhaps I'm missing something, but
> 
> We are talking about formally defining custom datatypes. If you want an ex:HexRGBColor datatatype, how do you define or describe the datatype?
> 
>> it seems that RDF Concepts does have a normative relationship to XSD,
> 
> Of course.
> 
>> as literals with no datatype IRI or language tag get the datatype xsd:string.
> 
> Well, in RDF 1.1 Concepts, there is no such thing as a literal without a datatype IRI.

Yes, that's what I meant, I was just referring to the Turtle description.

> [[
> Concrete syntaxes may support simple literals, consisting of only a lexical form without any datatype IRI or language tag. Simple literals only exist in concrete syntaxes, and are treated as syntactic sugar for abstract syntax literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string
> ]]
> http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal
> 
>> Also, in Turtle, native number representations are associated with xsd:integer, xsd:decimal and xsd:double.
>> 
>> true/false values are represented as literals with xsd:boolean.
> 
> Right.
> 
>> We considered this in JSON-LD; JSON numbers are translated to xsd:integer or xsd:double, and true/false to xsd:boolean when transforming to RDF.
> 
> Doing this differently from SPARQL in the case of xsd:decimal (that is, fraction but no exponent) is a bad idea. You'll get situations where 1.0 in a SPARQL query doesn't match 1.0 in a JSON-LD document because of different numeric datatypes, and where 1.0 written in Turtle and 1.0 written in JSON-LD produce different literals.

The problem is, in JSON, there's only a single number type, so you can't distinguish between decimal and double. You can distinguish between double and integer due to the presence or absence of a decimal point. JSON-LD supports all datatyped literals using the expanded format:

{ "@value": "1.1", "@type": "xsd:decimal"}

To ensure fidelity of numeric types in JSON-LD, it's usually best to avoid using native JSON types.

>> When going from RDF, strings are used unless an option is specified do use xsd types.
> 
> That doesn't quite make sense to me.
> 
> xsd:string strings should always be JSON strings and never XSD types in JSON-LD.

Yes, xsd:strings are always presented as simple strings, or as an expanded value with only a @value key.

> The xsd:integer, xsd:decimal, xsd:double and xsd:boolean types should always be represented with the native JSON number / boolean representation, and never as XSD types.

The problem is that this can be lossy, in the case of xsd:decimal and xsd:double. There is also some subtle interaction when expanding; native types are never expanded. When compacting, only string representations will match when there is a datatype coercion. This is so that, when working within JSON, the use of native types (numeric and boolean anyway) is lossless across the different algorithms.

We did discuss always using the native representations for xsd:integer and xsd:double, but this was deemed to introduce too much chance of data corruption. See "Data Round Tripping in the API[1] and discussion in issues 98 [2] and 81 [3].

> For the other types (rdf:XMLLiteral, rdf:HTML, rdf:langString, xsd:xxx, any custom data types) I would argue quite strongly that the default should be to retain all information (hence allowing round-trips from RDF to JSON-LD back to RDF). Perhaps there could be a switch that, if manually enabled, serializes all these literals as plain strings.

All other typed literals are expressed using the expanded notation, for example:

{ "@value": "e = mc<sup>2</sup>", "@type": "rdf:HTML"}

If type coercion is specified in the context, these will be serialized as plain strings. For example, if the term "text" was defined to expand to "schema:text" and @type was set to "rdf:HTML", this would be rendered simply as follows:

{
  "text": "e = mc<sup>2</sup>"
}

> But my preferred option would still be that toRDF can be invoked with a context object, and if I want some property with a custom datatype to be serialized as a simple JSON string, then I can provide a term definition with type coercion for that property in the context object. I'm not sure if the JSON-LD APIs support something like this at the moment.

Yes, just specify the mapping in the context used to compact and this is the form which is used. However, within the JSON-LD API methods, there is no way to convert from native types to expanded (or string compacted) values without going through RDF and using the "useNativeTypes" flag.

> Best,
> Richard


Gregg

[1] http://json-ld.org/spec/latest/json-ld-api/#data-round-tripping
[2] https://github.com/json-ld/json-ld.org/issues/98
[3] https://github.com/json-ld/json-ld.org/issues/81
Received on Tuesday, 4 September 2012 22:54:53 UTC