RE: Defining a common convention for marking up JSON

On Friday, August 30, 2013 12:35 AM, Michael Pizzo wrote:
> Thanks for the quick response and thoughts Markus.  I’m glad to see,
> from the responses so far, that there is interest in exploring some
> type of alignment.

There definitely is but please bear with me till I fully understand the
"problem" because at the moment I think I can't see it.

[...] 
>> (please
>> note that it is fine to use properties starting with an @ as long as it
is
>> not a defined keyword from a JSON-LD perspective). 
>
> I was wondering if the list of keywords was hard-coded or if the
> @ prefix were a general mechanism. There are advantages to both,
> of course; one is less restrictive for general property names and
> the other is more extensible. 

The list is hardcoded. There's however the following statement in the spec
discouraging third parties from defining new keywords:

    To avoid forward-compatibility issues, a term SHOULD NOT start with
    an @ character as future versions of JSON-LD may introduce additional
    keywords.


>> Also, I don't think (at
>> least for JSON-LD) that we can differentiate between "markup" and "data".
>> It's not like HTML where you just markup some text. Losing, e.g., an
>> identifier of an entity is not really desired and most people wouldn't
>> classify that as markup - at least I wouldn't.
>
> Markup may be a poor choice of words. The general idea is that there is
> "data" and "meta" or "control information" (such as type, etc.). A simple
> JSON processor wouldn't know what to do with type, and wouldn't have to;
> it could just skip it.

That's the part where I think we disagree most. In JSON-LD the type, the
language etc. are part of the data. There's no markup (well, you could argue
that @index is markup, but that's really it). That's one thing. The other
thing is that a "simple JSON processor" doesn't know what to do with any of
the properties. The whole document is an opaque structure. All it can do, is
to transform a string into an in-memory representation.

It depends on the application on top of the JSON parser to interpret the
data. Unfortunately, that application has to depend on out-of-band
information to be able to interpret it. JSON-LD tries to bring that
information in band (just as OData does) by making the data unambiguous.
AFAIK OData does much more since it also defines service interfaces etc. And
that's probably the reason why you are talking about "markup". In JSON-LD
you would need a separate vocabulary to describe that "metadata". LDP [1] is
such a vocabulary, Hydra another one [2-3].


> Even for the identifier, a general control that's just trying to paint
> data on a screen may be perfectly fine ignoring the identifier for an
> entity.

Right, just as you say it *may* be fine ignoring the identifier. But you
don't know. It is up to that application to decide which *data* it renders
and which it ignores. 


> It's only a consumer that understands that this JSON is JSON-LD, and wants
> to do something like link to the object, that cares about the identifier.

That can be said about every other property as well.


> That doesn't mean it's not there for consumers that do care about it, just
> that a namespacing mechanism for properties enables generic parsers to be
> trained to look for the meta-information they care about and ignore the
rest.

The question is whether that's metadata or not. Would you classify the
primary key in a DB record as metadata? I wouldn't. Of course, an
application might ignore it nevertheless because it doesn't need it.


[...]
>> I haven't had a look at the latest OData draft yet, but how does a
processor
>> know what odata (or any other prefix) stands for? Who owns it? Is there a
>> central registry for those prefixes?
>
> Good question. The answer today is currently somewhat specific to OData
> ("odata" is reserved, and the document references a metadata document that
> defines the prefixes).

Does OData still use application/json as media type? If that's the case, how
would a processor know whether this is really intended to be OData or
whether someone just accidently called a property odata.something? JSON-LD
doesn't redefine the semantics of existing JSON. It has its own media type
(application/ld+json) which defines the semantics of those keywords in such
a document. If you want to serve it as JSON, you would have to associate a
context to it (using an HTTP link header with a very specific relation). So
there's no risk of overwriting other namespaces as OData does. Everything is
visible at the HTTP level.


> This is certainly an area that we could collaborate on as well. We could
define
> a registry of well-known prefixes, together with a mechanism like XML has
to
> define ad-hoc prefixes. 

We already have such a mechanism, the context. It's completely
decentralized. You can host your context on any site and reference it from
any JSON-LD document.


>> You can do that already, although you would have to add a context (which
in
>> the case of a JSON document could also be referenced by an HTTP Link
header
>> [1]) aliasing the keywords [2]. For the sake of simplicity, I embed it
>> directly in the following example:
>>
>> {
>>   "@context": [
>      { "jsonld.id": "@id" },
>      "http://json-ld.org/contexts/person.jsonld"
>>   ],
>>   "jsonld.id": "http://dbpedia.org/resource/John_Lennon",
>>   "name": "John Lennon",
>>   "born": "1940-10-09",
>>   "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"
>> }
>
> Interesting. So (except for context) you could make the JSON-LD keywords
> information look like ODATA-JSON annotations.

Exactly


> That’s actually really encouraging, but still feels like a one-off
> for making JSON-LD look (mostly) like OData JSON, and not a general
> solution for custom/third party annotations.

Why not? Just define the context and host it at a well-known location such
as, e.g., http://odata.org/context.jsonld. Everyone who wants to use JSON-LD
in that way, then simply references that context and that's it. Documents
that aren't using those keywords, can be automatically transformed to do so
by our API [4].


[...]
> I'm sure we could train processors to understand both OData's JSON format
> and JSON-LD as one-offs, but the problem becomes when the next JSON-based
> format comes along and defines their own way to add control information.
> Or, when someone simply wants to add custom annotations to a JSON payload.

That's exactly why there exist media types. You cannot override the
semantics of an existing media type. Of course you can define in your spec
that all properties starting with "odata." mean something very specific for
a OData processor but such a processor wouldn't have any way to find out if
that is really what the author intended if they are served as
application/json. The author just tells you that it is JSON. If you go and
look up RFC4627 which defines application/json you obviously won't find
anything about "odata.". 


> A namespacing mechanism allows a processor to understand a single, simple
rule
> (like names containing a dot are namespaced) and anybody can add their own
> specific information to a payload, without worrying about conflicts. 
> Processors/applications can pick and choose what they want to pay
attention to.

Right, but that rule has to be defined at the media type level. We define it
for application/ld+json. We can try to align that with what you are doing
but we cannot force that on anyone using application/json. It is not under
our control.


>>> JSON parsers would have a common way to differentiate
>>> markup from data, and could consume/ignore/expose whatever markup they
>>> chose.
>>
>> As already sais above, I don't think we can differentiate between markup
and
>> data in JSON-LD.
>
> Really? I think it would be very useful for a general JSON processor to
> recognize the data properties of a JSON-LD payload, even if just to paint
it
> on a screen, without needing to know/understand/ignore all of the JSON-LD
> keywords.

Yeah really :-) A general JSON processor will never recognize any property.
They are all opaque for a JSON processor. We have to talk about JSON-LD
processors and OData processors and see how we can align them.


> Again, thanks for taking the time for a detailed response. I actually
> learned a lot, and am encouraged that there may be a happy path here.
> I hope my answers above make sense, and help clarify the goal of moving
> from static, predefined keywords in each JSON-based format to a general,
> extensible, customizable annotation mechanism that everyone can
> use/understand.

They definitely helped me to understand your position. I think the key
difference between JSON-LD and OData is that OData does have metadata
properties whereas JSON-LD keywords are solely used as syntactic constructs
to express data. JSON-LD's goal is to make the data self-descriptive and
eliminate out of band information. It does not define service interfaces as
OData does. As such, I think there's no metadata in JSON-LD documents that
could be ignored without losing information, but there is in OData
documents. Is that classification correct? If so, it would be very valuable
to at least be able to interpret OData *data* as JSON-LD.



[1] http://www.w3.org/TR/ldp/
[2] http://www.w3.org/community/hydra/
[3] http://www.markus-lanthaler.com/hydra/
[4] http://www.w3.org/TR/json-ld-api/


--
Markus Lanthaler
@markuslanthaler

Received on Friday, 30 August 2013 09:13:32 UTC