Re: Defining a common convention for marking up JSON from Robert Sanderson on 2013-09-24 (public-linked-json@w3.org from September 2013)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Tue, 24 Sep 2013 15:57:48 -0600
Cc: "public-linked-json@w3.org" <public-linked-json@w3.org>
Message-ID: <CABevsUEvat_JCP1ih+y+RiWqL8DhEQ5t+UNy-NgRCAnTETq2iw@mail.gmail.com>
It's a little more than that I think...

You also need the other part of XML namespaces ... the URI for the
namespace, rather than the prefix, to distinguish my "@jsonld." from
someone else's "@jsonld."  And then you need a way to link them, and thus a
magic equivalent to xmlns.
Alternatively you need to maintain a registry of "@(namespace)."
descriptions.
And finally you need a way to convince JSON developers that this is at all
useful to anyone.

I think the last of those is the most challenging, especially when it's not
built into the JSON specification.

Best of luck! :)

Rob



On Tue, Sep 24, 2013 at 3:36 PM, Michael Pizzo <mikep@microsoft.com> wrote:

>  Picking back up on this thread (how quickly time passes).****
>
> ** **
>
> I think we got a bit off track by talking about whether keywords were "data" or "markup". Regardless of whether we consider information like context, type, etc., markup, we have the basic problem that we want to add information to the JSON payload that doesn't conflict with the data members (or "terms") being serialized.  ****
>
> ** **
>
> The fact that JSON-LD prefixes keywords with "@" and recommends that terms NOT start with "@" illustrates the point exactly. We are trying to avoid conflicts between term names, keywords, and other potential (important) information about the collection of terms.****
>
> ** **
>
> Prefixing such "keywords" with "@" works for one definer of keywords (i.e., those defined by JSON-LD), but if other parties want to add additional keywords to the same payload, without conflicting with term names, JSON-LD, OR EACH OTHER, they also have to come up with a convention. Which means that someone wanting to persist terms compatible with multiple systems needs to know about, and avoid, multiple conventions.****
>
> ** **
>
> Again, if we take the example of XML, one thing XML does very well is allow a payload to be made up of elements defined by multiple different parties, with multiple different purposes. By namespacing the element names, these elements can all co-exist and processors understand exactly how to process the results according to their own set of rules.****
>
> ** **
>
> JSON-LD and OData are both **so** close to this model. Both JSON-LD and OData are adding properties to the JSON payload that we need to make sure don't conflict with the terms being serialized OR EACH OTHER. JSON-LD choses prefixing keywords with "@" to try and avoid conflicts with term names, but this is not yet common nor general. OData defines a common mechanism of qualifying the names to allow multiple different parties to add content without conflicts, but relies on the simple presence of a dot in the name to distinguish term names from namespaced keywords (which precludes term names from containing dots).****
>
> ** **
>
> If we could both bend just a little; if OData would prefix keywords with "@odata.*", and JSON-LD would prefix keywords with "@jsonld.*" then we could share a convention for adding keywords that don't conflict with term names without conflicting between the two (i.e., @odata.context is different from @jsonld.context). At the same time we could allow other parties to add information to the payload (i.e., "@org.iso.unitsofmeasure":"meters") and further reduce the likelihood of having term names conflict with any of these keywords (i.e., even if there was a term named "@context" it wouldn't conflict with JSON-LD's "@jsonld.context).****
>
> ** **
>
> The OASIS OData Technical Committee has signed off on a Committee Specification 01 as the final OData specification ready for standardization. I have recommended that the Technical Committee delay standardization by producing a Committee Specification 02 that aligns with JSON-LD by prefixing all annotations with "@". So all OData keywords within a JSON payload with start with "@odata.", and other annotations that follow the convention will start with "@{namespace}". ****
>
> ** **
>
> It would be fantastic if JSON-LD would support this convention by prefixing keywords with "@jsonld.*".****
>
> ** **
>
> On Friday, August 30, 2013 12:35 AM, Michael Pizzo wrote:****
>
> > Thanks for the quick response and thoughts Markus.  I’m glad to see,****
>
> > from the responses so far, that there is interest in exploring some****
>
> > type of alignment.****
>
> ** **
>
> There definitely is but please bear with me till I fully understand the****
>
> "problem" because at the moment I think I can't see it.****
>
> ** **
>
> [...] ****
>
> >> (please****
>
> >> note that it is fine to use properties starting with an @ as long as it****
>
> is****
>
> >> not a defined keyword from a JSON-LD perspective). ****
>
> >** **
>
> > I was wondering if the list of keywords was hard-coded or if the****
>
> > @ prefix were a general mechanism. There are advantages to both,****
>
> > of course; one is less restrictive for general property names and****
>
> > the other is more extensible. ****
>
> ** **
>
> The list is hardcoded. There's however the following statement in the spec****
>
> discouraging third parties from defining new keywords:****
>
> ** **
>
>     To avoid forward-compatibility issues, a term SHOULD NOT start with****
>
>     an @ character as future versions of JSON-LD may introduce additional****
>
>     keywords.****
>
> ** **
>
> ** **
>
> >> Also, I don't think (at****
>
> >> least for JSON-LD) that we can differentiate between "markup" and "data".****
>
> >> It's not like HTML where you just markup some text. Losing, e.g., an****
>
> >> identifier of an entity is not really desired and most people wouldn't****
>
> >> classify that as markup - at least I wouldn't.****
>
> >** **
>
> > Markup may be a poor choice of words. The general idea is that there is****
>
> > "data" and "meta" or "control information" (such as type, etc.). A simple****
>
> > JSON processor wouldn't know what to do with type, and wouldn't have to;****
>
> > it could just skip it.****
>
> ** **
>
> That's the part where I think we disagree most. In JSON-LD the type, the****
>
> language etc. are part of the data. There's no markup (well, you could argue****
>
> that @index is markup, but that's really it). That's one thing. The other****
>
> thing is that a "simple JSON processor" doesn't know what to do with any of****
>
> the properties. The whole document is an opaque structure. All it can do, is****
>
> to transform a string into an in-memory representation.****
>
> ** **
>
> It depends on the application on top of the JSON parser to interpret the****
>
> data. Unfortunately, that application has to depend on out-of-band****
>
> information to be able to interpret it. JSON-LD tries to bring that****
>
> information in band (just as OData does) by making the data unambiguous.****
>
> AFAIK OData does much more since it also defines service interfaces etc. And****
>
> that's probably the reason why you are talking about "markup". In JSON-LD****
>
> you would need a separate vocabulary to describe that "metadata". LDP [1] is****
>
> such a vocabulary, Hydra another one [2-3].****
>
> ** **
>
> ** **
>
> > Even for the identifier, a general control that's just trying to paint****
>
> > data on a screen may be perfectly fine ignoring the identifier for an****
>
> > entity.****
>
> ** **
>
> Right, just as you say it *may* be fine ignoring the identifier. But you****
>
> don't know. It is up to that application to decide which *data* it renders****
>
> and which it ignores. ****
>
> ** **
>
> ** **
>
> > It's only a consumer that understands that this JSON is JSON-LD, and wants****
>
> > to do something like link to the object, that cares about the identifier.****
>
> ** **
>
> That can be said about every other property as well.****
>
> ** **
>
> ** **
>
> > That doesn't mean it's not there for consumers that do care about it, just****
>
> > that a namespacing mechanism for properties enables generic parsers to be****
>
> > trained to look for the meta-information they care about and ignore the****
>
> rest.****
>
> ** **
>
> The question is whether that's metadata or not. Would you classify the****
>
> primary key in a DB record as metadata? I wouldn't. Of course, an****
>
> application might ignore it nevertheless because it doesn't need it.****
>
> ** **
>
> ** **
>
> [...]****
>
> >> I haven't had a look at the latest OData draft yet, but how does a****
>
> processor****
>
> >> know what odata (or any other prefix) stands for? Who owns it? Is there a****
>
> >> central registry for those prefixes?****
>
> >** **
>
> > Good question. The answer today is currently somewhat specific to OData****
>
> > ("odata" is reserved, and the document references a metadata document that****
>
> > defines the prefixes).****
>
> ** **
>
> Does OData still use application/json as media type? If that's the case, how****
>
> would a processor know whether this is really intended to be OData or****
>
> whether someone just accidently called a property odata.something? JSON-LD****
>
> doesn't redefine the semantics of existing JSON. It has its own media type****
>
> (application/ld+json) which defines the semantics of those keywords in such****
>
> a document. If you want to serve it as JSON, you would have to associate a****
>
> context to it (using an HTTP link header with a very specific relation). So****
>
> there's no risk of overwriting other namespaces as OData does. Everything is****
>
> visible at the HTTP level.****
>
> ** **
>
> ** **
>
> > This is certainly an area that we could collaborate on as well. We could****
>
> define****
>
> > a registry of well-known prefixes, together with a mechanism like XML has****
>
> to****
>
> > define ad-hoc prefixes. ****
>
> ** **
>
> We already have such a mechanism, the context. It's completely****
>
> decentralized. You can host your context on any site and reference it from****
>
> any JSON-LD document.****
>
> ** **
>
> ** **
>
> >> You can do that already, although you would have to add a context (which****
>
> in****
>
> >> the case of a JSON document could also be referenced by an HTTP Link****
>
> header****
>
> >> [1]) aliasing the keywords [2]. For the sake of simplicity, I embed it****
>
> >> directly in the following example:****
>
> >>** **
>
> >> {****
>
> >>   "@context": [****
>
> >      { "jsonld.id": "@id" },****
>
> >      "http://json-ld.org/contexts/person.jsonld"****
>
> >>   ],****
>
> >>   "jsonld.id": "http://dbpedia.org/resource/John_Lennon",****
>
> >>   "name": "John Lennon",****
>
> >>   "born": "1940-10-09",****
>
> >>   "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"****
>
> >> }****
>
> >** **
>
> > Interesting. So (except for context) you could make the JSON-LD keywords****
>
> > information look like ODATA-JSON annotations.****
>
> ** **
>
> Exactly****
>
> ** **
>
> ** **
>
> > That’s actually really encouraging, but still feels like a one-off****
>
> > for making JSON-LD look (mostly) like OData JSON, and not a general****
>
> > solution for custom/third party annotations.****
>
> ** **
>
> Why not? Just define the context and host it at a well-known location such****
>
> as, e.g., http://odata.org/context.jsonld. Everyone who wants to use JSON-LD****
>
> in that way, then simply references that context and that's it. Documents****
>
> that aren't using those keywords, can be automatically transformed to do so****
>
> by our API [4].****
>
> ** **
>
> ** **
>
> [...]****
>
> > I'm sure we could train processors to understand both OData's JSON format****
>
> > and JSON-LD as one-offs, but the problem becomes when the next JSON-based****
>
> > format comes along and defines their own way to add control information.****
>
> > Or, when someone simply wants to add custom annotations to a JSON payload.****
>
> ** **
>
> That's exactly why there exist media types. You cannot override the****
>
> semantics of an existing media type. Of course you can define in your spec****
>
> that all properties starting with "odata." mean something very specific for****
>
> a OData processor but such a processor wouldn't have any way to find out if****
>
> that is really what the author intended if they are served as****
>
> application/json. The author just tells you that it is JSON. If you go and****
>
> look up RFC4627 which defines application/json you obviously won't find****
>
> anything about "odata.". ****
>
> ** **
>
> ** **
>
> > A namespacing mechanism allows a processor to understand a single, simple****
>
> rule****
>
> > (like names containing a dot are namespaced) and anybody can add their own****
>
> > specific information to a payload, without worrying about conflicts. ****
>
> > Processors/applications can pick and choose what they want to pay****
>
> attention to.****
>
> ** **
>
> Right, but that rule has to be defined at the media type level. We define it****
>
> for application/ld+json. We can try to align that with what you are doing****
>
> but we cannot force that on anyone using application/json. It is not under****
>
> our control.****
>
> ** **
>
> ** **
>
> >>> JSON parsers would have a common way to differentiate****
>
> >>> markup from data, and could consume/ignore/expose whatever markup they****
>
> >>> chose.****
>
> >>** **
>
> >> As already sais above, I don't think we can differentiate between markup****
>
> and****
>
> >> data in JSON-LD.****
>
> >** **
>
> > Really? I think it would be very useful for a general JSON processor to****
>
> > recognize the data properties of a JSON-LD payload, even if just to paint****
>
> it****
>
> > on a screen, without needing to know/understand/ignore all of the JSON-LD****
>
> > keywords.****
>
> ** **
>
> Yeah really :-) A general JSON processor will never recognize any property.****
>
> They are all opaque for a JSON processor. We have to talk about JSON-LD****
>
> processors and OData processors and see how we can align them.****
>
> ** **
>
> ** **
>
> > Again, thanks for taking the time for a detailed response. I actually****
>
> > learned a lot, and am encouraged that there may be a happy path here.****
>
> > I hope my answers above make sense, and help clarify the goal of moving****
>
> > from static, predefined keywords in each JSON-based format to a general,****
>
> > extensible, customizable annotation mechanism that everyone can****
>
> > use/understand.****
>
> ** **
>
> They definitely helped me to understand your position. I think the key****
>
> difference between JSON-LD and OData is that OData does have metadata****
>
> properties whereas JSON-LD keywords are solely used as syntactic constructs****
>
> to express data. JSON-LD's goal is to make the data self-descriptive and****
>
> eliminate out of band information. It does not define service interfaces as****
>
> OData does. As such, I think there's no metadata in JSON-LD documents that****
>
> could be ignored without losing information, but there is in OData****
>
> documents. Is that classification correct? If so, it would be very valuable****
>
> to at least be able to interpret OData *data* as JSON-LD.****
>
> ** **
>
> ** **
>
> ** **
>
> [1] http://www.w3.org/TR/ldp/****
>
> [2] http://www.w3.org/community/hydra/****
>
> [3] http://www.markus-lanthaler.com/hydra/****
>
> [4] http://www.w3.org/TR/json-ld-api/****
>
> ** **
>
> ** **
>
> --****
>
> Markus Lanthaler****
>
> @markuslanthaler****
>
> ** **
>
Received on Tuesday, 24 September 2013 21:58:17 UTC