RE: Defining a common convention for marking up JSON from Michael Pizzo on 2013-08-29 (public-linked-json@w3.org from August 2013)

From: Michael Pizzo <mikep@microsoft.com>
Date: Thu, 29 Aug 2013 22:34:53 +0000
To: "public-linked-json@w3.org" <public-linked-json@w3.org>
Message-ID: <d36e5bf9a84b4918a61ee6477cb3111c@BN1PR03MB220.namprd03.prod.outlook.com>
Thanks for the quick response and thoughts Markus.  I'm glad to see, from the responses so far, that there is interest in exploring some type of alignment.



A few comments below:



Just some quick thoughts/comments/questions.



>> Dear JSON-LD Community;

>>

>> JSON-LD, OData's JSON format, and other formats built on JSON are

>> trying to do very similar things (add "markup" to a JSON payload for

>> things like ids, types, etc.). Unfortunately, since JSON doesn't

>> define a way to differentiate properties from markup, each

>> specification invents its own naming conventions to differentiate

>> properties from markup.

>>

>> We have a real opportunity to align efforts here in defining a common

>> convention for marking up JSON payloads.

>>

>> JSON-LD adds markup to JSON payloads by defining a set of keywords

>> that begin with the "@" symbol. JSON-LD parsers understand these

>> keywords and treat them differently than other properties.^

>

>More or less the only reason we choose to prefix JSON-LD's keywords with an

>@ symbol was to reduce the likelihood of a collision with already existing

>property names.



Right. Microsoft did the same thing in an early OData JSON format by prefixing keywords with double underscore to try and avoid collisions ("__metadata", "__count", etc). We later found we needed a more general/extensible mechanism that allowed third parties to annotate JSON objects and properties.



>> OData's JSON format separates properties from markup through a

>> namespacing mechanism similar to XML. Properties that contain a dot

>> (.) (which most JSON parsers already treat differently) are "namespace

>> qualified" names - the prefix before the dot is the namespace and the

>> part after the last dot is the keyword within that namespace.

>

>By "treat differently" you mean that you can't access them using the dot

>notation anymore, right?



Exactly.



>> This general mechanism allows anyone to extend a JSON payload with

>> "markup", and JSON clients to differentiate markup from data, and

>> ignore markup that they don't know/care about.

>

>Unless properties with a dot in them are already used. The same obviously

>applies to properties colliding with one of the JSON-LD keywords



Right. Since JSON doesn't define a means of annotating data with additional information, both OData and JSON-LD have defined conventions that attempt to differentiate data properties from other types of meta-information. We could discuss whether we needed a convention less likely to conflict, but having a common mechanism seems incredibly valuable.



> (please

>note that it is fine to use properties starting with an @ as long as it is

>not a defined keyword from a JSON-LD perspective).



I was wondering if the list of keywords was hard-coded or if the @ prefix were a general mechanism. There are advantages to both, of course; one is less restrictive for general property names and the other is more extensible.



>Also, I don't think (at

>least for JSON-LD) that we can differentiate between "markup" and "data".

>It's not like HTML where you just markup some text. Losing, e.g., an

>identifier of an entity is not really desired and most people wouldn't

>classify that as markup - at least I wouldn't.



Markup may be a poor choice of words. The general idea is that there is "data" and "meta" or "control information" (such as type, etc.). A simple JSON processor wouldn't know what to do with type, and wouldn't have to; it could just skip it.



Even for the identifier, a general control that's just trying to paint data on a screen may be perfectly fine ignoring the identifier for an entity. It's only a consumer that understands that this JSON is JSON-LD, and wants to do something like link to the object, that cares about the identifier. That doesn't mean it's not there for consumers that do care about it, just that a namespacing mechanism for properties enables generic parsers to be trained to look for the meta-information they care about and ignore the rest.



>> OData uses this *general mechanism* to add odata-specific markup,

>> defined in the "odata" namespace. So "odata.id" is clearly recognized

>> as the id keyword defined by the OData specification, and "odata.type"

>> is clearly recognized as the type keyword defined by the OData

>> specification (there is clearly an opportunity to align in some of

>> these moving forward, but for right now I'm more interested in having

>> a common "markup" convention).

>

>I haven't had a look at the latest OData draft yet, but how does a processor

>know what odata (or any other prefix) stands for? Who owns it? Is there a

>central registry for those prefixes?



Good question. The answer today is currently somewhat specific to OData ("odata" is reserved, and the document references a metadata document that defines the prefixes). This is certainly an area that we could collaborate on as well. We could define a registry of well-known prefixes, together with a mechanism like XML has to define ad-hoc prefixes.



>> Following this same common convention, JSON-LD could mark up a payload

>> as:

>>

>> {

>>   "jsonld.context": "http://json-ld.org/contexts/person.jsonld",

>>   "jsonld.id": "http://dbpedia.org/resource/John_Lennon",

>>   "name": "John Lennon",

>>   "born": "1940-10-09",

>>   "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"

>> }

>

>You can do that already, although you would have to add a context (which in

>the case of a JSON document could also be referenced by an HTTP Link header

>[1]) aliasing the keywords [2]. For the sake of simplicity, I embed it

>directly in the following example:

>

>{

>  "@context": [

>    { "jsonld.id": "@id" },

>    "http://json-ld.org/contexts/person.jsonld"

>  ],

>  "jsonld.id": "http://dbpedia.org/resource/John_Lennon",

>  "name": "John Lennon",

>  "born": "1940-10-09",

>  "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"

>}



Interesting. So (except for context) you could make the JSON-LD keywords information look like ODATA-JSON annotations. That's actually really encouraging, but still feels like a one-off for making JSON-LD look (mostly) like OData JSON, and not a general solution for custom/third party annotations.



>> Regardless the syntax, providing a common convention for namespace

>> qualifying "markup" keywords give us a real opportunity to foster

>> consistency, reuse, and interoperability.

>

>If we are talking about namespacing, we shouldn't talk about JSON-LD's

>keywords but its compact IRIs [3] which use colons as separator which is

>aligned with XML CURIEs and all RDF serialization formats. In contrast to

>keywords, that's something you can't change in JSON-LD. You can however,

>work around it by explicitly mapping terms (as we call them) to CURIEs,

>e.g.,

>

>   "foaf.name": "foaf:name"

>

>

>> Both JSON-LD and OData are close to releasing an initial standard

>> (OData has just progressed to a Committee Specification in OASIS), so

>> the window is very close to closing on alignment, but the potential

>> upside could be huge. Imagine being able to mark up the same JSON

>> payload with JSON-LD keywords, odata keywords, and other

>> "annotations".

>

>Is there anything that prevents that today? JSON-LD processors would ignore

>all odata.xyz properties unless they are mapped to something in a context.

>What are OData processors doing with JSON-LD keywords?



I'm sure we could train processors to understand both OData's JSON format and JSON-LD as one-offs, but the problem becomes when the next JSON-based format comes along and defines their own way to add control information. Or, when someone simply wants to add custom annotations to a JSON payload.



A namespacing mechanism allows a processor to understand a single, simple rule (like names containing a dot are namespaced) and anybody can add their own specific information to a payload, without worrying about conflicts.

Processors/applications can pick and choose what they want to pay attention to.



>> JSON parsers would have a common way to differentiate

>> markup from data, and could consume/ignore/expose whatever markup they

>> chose.

>

>As already sais above, I don't think we can differentiate between markup and

>data in JSON-LD.



Really? I think it would be very useful for a general JSON processor to recognize the data properties of a JSON-LD payload, even if just to paint it on a screen, without needing to know/understand/ignore all of the JSON-LD keywords.



>> Would the JSON-LD community be open to working with the OData

>> community to agree on a standard, extensible, namespaced mechanism

>> that all JSON-based formats could use to extend JSON?

>

>We are a very open community and open for all suggestions that simplify

>developer's lives. I can't say much at the moment because I haven't had a

>look at OData for quite a while. Maybe it becomes a bit clearer to me when

>you answer my questions above. From what I understand, a JSON-LD processor

>wouldn't have any problem ignoring "OData markup".



Again, thanks for taking the time for a detailed response. I actually learned a lot, and am encouraged that there may be a happy path here. I hope my answers above make sense, and help clarify the goal of moving from static, predefined keywords in each JSON-based format to a general, extensible, customizable annotation mechanism that everyone can use/understand.



>[1] http://json-ld.org/spec/latest/json-ld/#interpreting-json-as-json-ld

>[2] http://json-ld.org/spec/latest/json-ld/#aliasing-keywords

>[3] http://json-ld.org/spec/latest/json-ld/#compact-iris

>

>

>

>--

>Markus Lanthaler

>@markuslanthaler
>
>
>
Received on Thursday, 29 August 2013 22:35:25 UTC