RE: Defining a common convention for marking up JSON

Thanks for the quick response and thoughts Markus.  I'm glad to see, from the responses so far, that there is interest in exploring some type of alignment.



A few comments below:



Just some quick thoughts/comments/questions.



>> Dear JSON-LD Community;

>>

>> JSON-LD, OData's JSON format, and other formats built on JSON are

>> trying to do very similar things (add "markup" to a JSON payload for

>> things like ids, types, etc.). Unfortunately, since JSON doesn't

>> define a way to differentiate properties from markup, each

>> specification invents its own naming conventions to differentiate

>> properties from markup.

>>

>> We have a real opportunity to align efforts here in defining a common

>> convention for marking up JSON payloads.

>>

>> JSON-LD adds markup to JSON payloads by defining a set of keywords

>> that begin with the "@" symbol. JSON-LD parsers understand these

>> keywords and treat them differently than other properties.^

>

>More or less the only reason we choose to prefix JSON-LD's keywords with an

>@ symbol was to reduce the likelihood of a collision with already existing

>property names.



Right. Microsoft did the same thing in an early OData JSON format by prefixing keywords with double underscore to try and avoid collisions ("__metadata", "__count", etc). We later found we needed a more general/extensible mechanism that allowed third parties to annotate JSON objects and properties.



>> OData's JSON format separates properties from markup through a

>> namespacing mechanism similar to XML. Properties that contain a dot

>> (.) (which most JSON parsers already treat differently) are "namespace

>> qualified" names - the prefix before the dot is the namespace and the

>> part after the last dot is the keyword within that namespace.

>

>By "treat differently" you mean that you can't access them using the dot

>notation anymore, right?



Exactly.



>> This general mechanism allows anyone to extend a JSON payload with

>> "markup", and JSON clients to differentiate markup from data, and

>> ignore markup that they don't know/care about.

>

>Unless properties with a dot in them are already used. The same obviously

>applies to properties colliding with one of the JSON-LD keywords



Right. Since JSON doesn't define a means of annotating data with additional information, both OData and JSON-LD have defined conventions that attempt to differentiate data properties from other types of meta-information. We could discuss whether we needed a convention less likely to conflict, but having a common mechanism seems incredibly valuable.



> (please

>note that it is fine to use properties starting with an @ as long as it is

>not a defined keyword from a JSON-LD perspective).



I was wondering if the list of keywords was hard-coded or if the @ prefix were a general mechanism. There are advantages to both, of course; one is less restrictive for general property names and the other is more extensible.



>Also, I don't think (at

>least for JSON-LD) that we can differentiate between "markup" and "data".

>It's not like HTML where you just markup some text. Losing, e.g., an

>identifier of an entity is not really desired and most people wouldn't

>classify that as markup - at least I wouldn't.



Markup may be a poor choice of words. The general idea is that there is "data" and "meta" or "control information" (such as type, etc.). A simple JSON processor wouldn't know what to do with type, and wouldn't have to; it could just skip it.



Even for the identifier, a general control that's just trying to paint data on a screen may be perfectly fine ignoring the identifier for an entity. It's only a consumer that understands that this JSON is JSON-LD, and wants to do something like link to the object, that cares about the identifier. That doesn't mean it's not there for consumers that do care about it, just that a namespacing mechanism for properties enables generic parsers to be trained to look for the meta-information they care about and ignore the rest.



>> OData uses this *general mechanism* to add odata-specific markup,

>> defined in the "odata" namespace. So "odata.id" is clearly recognized

>> as the id keyword defined by the OData specification, and "odata.type"

>> is clearly recognized as the type keyword defined by the OData

>> specification (there is clearly an opportunity to align in some of

>> these moving forward, but for right now I'm more interested in having

>> a common "markup" convention).

>

>I haven't had a look at the latest OData draft yet, but how does a processor

>know what odata (or any other prefix) stands for? Who owns it? Is there a

>central registry for those prefixes?



Good question. The answer today is currently somewhat specific to OData ("odata" is reserved, and the document references a metadata document that defines the prefixes). This is certainly an area that we could collaborate on as well. We could define a registry of well-known prefixes, together with a mechanism like XML has to define ad-hoc prefixes.



>> Following this same common convention, JSON-LD could mark up a payload

>> as:

>>

>> {

>>   "jsonld.context": "http://json-ld.org/contexts/person.jsonld",

>>   "jsonld.id": "http://dbpedia.org/resource/John_Lennon",

>>   "name": "John Lennon",

>>   "born": "1940-10-09",

>>   "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"

>> }

>

>You can do that already, although you would have to add a context (which in

>the case of a JSON document could also be referenced by an HTTP Link header

>[1]) aliasing the keywords [2]. For the sake of simplicity, I embed it

>directly in the following example:

>

>{

>  "@context": [

>    { "jsonld.id": "@id" },

>    "http://json-ld.org/contexts/person.jsonld"

>  ],

>  "jsonld.id": "http://dbpedia.org/resource/John_Lennon",

>  "name": "John Lennon",

>  "born": "1940-10-09",

>  "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"

>}



Interesting. So (except for context) you could make the JSON-LD keywords information look like ODATA-JSON annotations. That's actually really encouraging, but still feels like a one-off for making JSON-LD look (mostly) like OData JSON, and not a general solution for custom/third party annotations.



>> Regardless the syntax, providing a common convention for namespace

>> qualifying "markup" keywords give us a real opportunity to foster

>> consistency, reuse, and interoperability.

>

>If we are talking about namespacing, we shouldn't talk about JSON-LD's

>keywords but its compact IRIs [3] which use colons as separator which is

>aligned with XML CURIEs and all RDF serialization formats. In contrast to

>keywords, that's something you can't change in JSON-LD. You can however,

>work around it by explicitly mapping terms (as we call them) to CURIEs,

>e.g.,

>

>   "foaf.name": "foaf:name"

>

>

>> Both JSON-LD and OData are close to releasing an initial standard

>> (OData has just progressed to a Committee Specification in OASIS), so

>> the window is very close to closing on alignment, but the potential

>> upside could be huge. Imagine being able to mark up the same JSON

>> payload with JSON-LD keywords, odata keywords, and other

>> "annotations".

>

>Is there anything that prevents that today? JSON-LD processors would ignore

>all odata.xyz properties unless they are mapped to something in a context.

>What are OData processors doing with JSON-LD keywords?



I'm sure we could train processors to understand both OData's JSON format and JSON-LD as one-offs, but the problem becomes when the next JSON-based format comes along and defines their own way to add control information. Or, when someone simply wants to add custom annotations to a JSON payload.



A namespacing mechanism allows a processor to understand a single, simple rule (like names containing a dot are namespaced) and anybody can add their own specific information to a payload, without worrying about conflicts.

Processors/applications can pick and choose what they want to pay attention to.



>> JSON parsers would have a common way to differentiate

>> markup from data, and could consume/ignore/expose whatever markup they

>> chose.

>

>As already sais above, I don't think we can differentiate between markup and

>data in JSON-LD.



Really? I think it would be very useful for a general JSON processor to recognize the data properties of a JSON-LD payload, even if just to paint it on a screen, without needing to know/understand/ignore all of the JSON-LD keywords.



>> Would the JSON-LD community be open to working with the OData

>> community to agree on a standard, extensible, namespaced mechanism

>> that all JSON-based formats could use to extend JSON?

>

>We are a very open community and open for all suggestions that simplify

>developer's lives. I can't say much at the moment because I haven't had a

>look at OData for quite a while. Maybe it becomes a bit clearer to me when

>you answer my questions above. From what I understand, a JSON-LD processor

>wouldn't have any problem ignoring "OData markup".



Again, thanks for taking the time for a detailed response. I actually learned a lot, and am encouraged that there may be a happy path here. I hope my answers above make sense, and help clarify the goal of moving from static, predefined keywords in each JSON-based format to a general, extensible, customizable annotation mechanism that everyone can use/understand.



>[1] http://json-ld.org/spec/latest/json-ld/#interpreting-json-as-json-ld

>[2] http://json-ld.org/spec/latest/json-ld/#aliasing-keywords

>[3] http://json-ld.org/spec/latest/json-ld/#compact-iris

>

>

>

>--

>Markus Lanthaler

>@markuslanthaler
>
>
>

Received on Thursday, 29 August 2013 22:35:25 UTC