Re: Mapping RDFa to microdata+json from Ivan Herman on 2011-11-25 (public-html-data-tf@w3.org from November 2011)

From: Ivan Herman <ivan@w3.org>
Date: Fri, 25 Nov 2011 11:05:26 +0100
To: Jeni Tennison <jeni@jenitennison.com>
Cc: HTML Data Task Force WG <public-html-data-tf@w3.org>, Gregg Kellogg <gregg@kellogg-assoc.com>
Message-Id: <C1953417-D6FC-49F1-90B3-1E89D791A64E@w3.org>
On Nov 25, 2011, at 10:45 , Jeni Tennison wrote:

> On 25 Nov 2011, at 08:11, Ivan Herman wrote:
>> What remains is the question on how JSON-LD could/should be mapped on the microdata-JSON. However, wouldn't that be a lossy mapping? Wouldn't we hit the issue of multiple typing, datatypes, etc, again?
> 
> 
> Multiple typing isn't a problem: in microdata+json, any item can have multiple types and they are all just strings (the limitation that they have to be part of the same vocabulary is in the microdata specification, not the definition of the JSON vocabulary).
> 
> Datatypes and languages are obviously not represented in microdata+json, but the fact is that consumers who use these formats don't care. If they did, they would be using a richer data model and a richer syntax to support it.
> 
> The main question as far as I'm concerned is the source of the data for the mapping:
> 
>  1. the RDFa markup itself
>  2. the two graphs that RDFa produces
>  3. any RDF
>  4. JSON-LD
> 
> #1 (directly from RDFa markup) would mean repeating the algorithm for RDFa parsing within a mapping specification (and the potential for the specs getting out of sync), but would mean that the generated microdata+json tree can closely match the structure of the original RDFa markup in the same way that it does for microformats-2 or microdata. It could also mean that the @vocab was picked up to determine when short property names were used, for example, which again makes the resulting microdata+json simpler to use.
> 
> #2 (using the graphs generated from RDFa) means that the converter can just plug in to the results of an RDFa parser, and the spec can be simpler, but I'm not sure that having information from both graphs gives much benefit over #3 (any RDF).
> 
> #3 (converting from any RDF) has the advantage of being usable from any RDF, not just that generated from RDFa, but means the generated microdata+json could be quite different from the original RDFa markup. It raises gnarly questions about when/whether/how to flatten the graph and when/whether/how to use short names for properties.
> 
> #4 (converting from JSON-LD) introduces a dependency on something that is not a W3C standard (might that mean IP issues?) and is still changing. On the other hand, it may well mean that the mapping to microdata+json can be defined easily because it shunts the difficult questions into the mapping from RDF(a) to JSON-LD instead.
> 
> I think overall my preference is probably for #1.

The problem I have with #1 is that not only would the specification get out of sync (although, once RDFa 1.1 is a Rec, that issue becomes moot) but implementations, too. For example, my current RDFa implementation relies on an RDF environment (RDFLib): triples are generated by the distiller as it goes through the DOM tree, but the management of those triples (avoiding duplications, management of BNodes, literals, etc) are all done outside the RDFa distiller code proper. Serialization into different formats is done by the package, too (my JSON-LD serializer is a plugin into RDFLib, not part, in this sense, of the RDFa distiller code). #1 would require to repeat the implementation. That is error prone.

#2 has the extra benefit over #3 exactly on the vocab issue: the RDFa spec requires to put, into the output graph, the information on the usage of @vocab. If only one @vocab is used, which is probably the 90% of the usage, that can be exploited by the serializer easily.

The dependency problem in #4 is temporary. The current goal is to have JSON-LD, when finalized by the W3C Community Group and if the result are appropriate for all, put back into a Rec track at W3C. 

I think that my preference, if we go down that route, is #2. That would mean, for example, a special (and lossy) serialization plugin like json-ld's.

Cheers 

Ivan



> 
> Jeni
> -- 
> Jeni Tennison
> http://www.jenitennison.com
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Friday, 25 November 2011 10:02:37 UTC