Re: JSON serialization from Gregg Kellogg on 2011-10-03 (public-html-data-tf@w3.org from October 2011)

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Mon, 3 Oct 2011 19:14:54 -0400
To: Jeni Tennison <jeni@jenitennison.com>
CC: "public-html-data-tf@w3.org" <public-html-data-tf@w3.org>, Ivan Herman <ivan@w3.org>, Gregg Kellogg <gregg@kellogg-assoc.com>
Message-ID: <4E3E6D0F-2AB6-4753-8C48-2C797BF4249F@greggkellogg.net>
Gregg

On Oct 3, 2011, at 11:21 AM, Jeni Tennison wrote:

> Gregg,
> 
> On 3 Oct 2011, at 17:32, Gregg Kellogg wrote:
>> On Oct 3, 2011, at 5:14 AM, Ivan Herman wrote:
>>> On Oct 2, 2011, at 19:23 , Gregg Kellogg wrote:
>>> 
>>>> The Microdata spec [1] now references only a JSON serialization. JSON is becoming more and more interesting for developers, and the RDF WG has looked at two different specs, RDF/JSON [2] and JSON-LD [3]. Recently, the RDF WG stopped working on RDF/JSON with strong interest in further incubating JSON-LD, which is now a W3C Community Group [4].
>>>> 
>>>> Should this task force provide guidance for HTML serializations on JSON serialization,
>>> 
>>> Gregg, I am not sure what you mean. The task force description does refer to a possible RDFa->microdata/JSON mapping. Is this what you mean? Or is it a microdata -> JSON-LD that you are considering?
>> 
>> As you note, in the group's charter is to provide specifications for mapping RDFa to Microdata's JSON format. The Microdata format is lossy, in that it (at least) looses language information. When considering JSON serialization, IMO RDF round-tripping is important. If data is serialized to JSON, we should be able to get all the semantic content back out. With Microdata's JSON, this raises some issues, that might be addressed by providing some suggestions for improvement to the HTML WG. This could conceivably be to replace the existing JSON representation with something that may be a standard way to represent RDF in JSON, thus dealing with round-tripping issues.
> 
> Undoubtedly it will be useful to have JSON versions of the RDF extracted from the RDFa/microdata/microformats embedded within an HTML page. Assuming that we manage to get a good mapping from microdata to RDF (Gregg, I'm really hoping you'll take a lead on that),

I can start be resurecting the Microdata-RDF bits and put them on the wiki. We can then add modifications we feel are necessary, such as the @itemprop token to URI generation. It could also be used to discuss other proposed changes, such as @itemdatatype.

> as far as I'm aware a mapping to JSON-LD follows naturally.

If we go Microdata->RDF->JSON-LD, there is a loss issue, depending on rules we decide for generating RDF from multi-valued properties. RDFa->RDF->JSON-LD wouldn't be so much of a problem.

Alternatively, we can consider suggesting changes to the Microdata JSON format so that Microdata->JSON->RDF has full fidelity. Describing Microdata->JSON-LD in terms of the existing processing rules would be pretty straightforward too.

> Am I wrong? Are there extra things that a processor needs to know to map from RDF to JSON-LD, where the way the data was expressed within the HTML page could help inform the mapping?

Semantically, there shouldn't be any other information necessary (assuming things like @lang are preserved). Syntactically, JSON-LD could benefit from the prefix mappings, @base and @vocab from RDFa to construct an @context that would allow for better use of prefixes and terms within the encoding, but that's sugar. We could also use the order of item definitions to create an equivalent tree.

> What we also need to consider is the reverse case, where there are consumers who do not use RDF as their internal data model (and do not use a vocabulary-specific internal data model either) but want to aggregate data that is expressed in RDFa alongside that expressed in microdata and microformats. The data models used by microdata and microformats-2 are indications of what that might look like: lists (not graphs) of entities with properties that have string or array or object values. The microformat-2 data model preserves some datatyping (what's a URI, what's a date/time). You're absolutely right that this is lossy from RDFa, which has a richer data model, but if you're a consumer that isn't using RDF as your data model then presumably you don't care.
> 
> Perhaps it would make sense for us to create a technical comparison of the three data models, show how to map to each from each syntax and make clear what information is lost with each mapping (and hence from any other formats / APIs created from those data models).
> 
> FWIW, there are two distinct points of view both on datatypes and on the ordering of values in multi-valued properties.
> 
> On datatypes, some people think that the type of a value is integral to the value and should be carried around with the value; others think that applications that use the value will convert it to an appropriate type when they need to use it.

This could also be a role for datatype entailment regimes, but these were rejected from RDFa because they load to data duplication (e.g., "1.0" and "1.0"^^xsd:decimal).

> On multi-valued properties, some people think that whether a property holds a sequence of values or has multiple values with no inherent order is something integral to the data and therefore needs to be carried around with the data; others think that as long as order is always preserved, applications are free to ignore that order if it isn't important for the processing they are doing.

If we always go from HTML to JSON, we can preserve order. If we go through RDF first, the order can be lost.

> These are philosophical differences. It would be great if we could document and explain them but it's probably going to be counterproductive for us to spend too much time discussing them.
> 
> Cheers,
> 
> Jeni
> 
>>>> [1] http://dev.w3.org/html5/md/#json
>>>> [2] http://docs.api.talis.com/platform-api/output-types/rdf-json
>>>> [3] http://json-ld.org/
>>>> [4] http://www.w3.org/community/json-ld/
> 
> -- 
> Jeni Tennison
> http://www.jenitennison.com
>
Received on Monday, 3 October 2011 23:15:47 UTC