Re: Ecosystem issues for JSON-LD 1.1 from Gregg Kellogg on 2018-04-25 (public-linked-json@w3.org from April 2018)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Wed, 25 Apr 2018 10:27:16 -0700
To: Robert Sanderson <azaroth42@gmail.com>
Cc: Linked JSON <public-linked-json@w3.org>
Message-Id: <F1D4DB9A-528B-465B-8271-2A85D756FA5C@greggkellogg.net>
> On Apr 24, 2018, at 11:22 AM, Robert Sanderson <azaroth42@gmail.com> wrote:
> 
> Dear all,
> 
> The following issues came out of a discussion with Dan Brickley of Google and Charles McCathie Neville recently of Yandex last week.  This is my summary, and does not necessarily reflect the opinions of Dan or Chaals, or their employers.
> 
> * Practical backwards compatibility with 1.0
> It is important to ensure that the next version of the specification is backwards compatible with the current 1.0.  There are hundreds of millions of sites using JSON-LD 1.0 that we cannot afford to invalidate over night.  Processing a 1.0 context in a 1.1 environment should have the same results, in practice even if there are bugs in the 1.0 spec that get fixed.  Or, another way, if schema.org <http://schema.org/> put @context:1.1 in their context document, no one should really notice. One idea was to scrape the JSON-LD out of the Common Crawl [1], and use it as a testbed for validating the effects of algorithmic changes in practice.  Any incompatibilities need to be very carefully considered with strong justification.

(`@version: 1.1`, I think). This is pretty much the case, as evidenced by almost all of the 1.0 test cases working for either a 1.0 or 1.1 processor. The exceptions are in term selection and Compact IRI prefix generation, which are really corner cases, but it would be worth comparing with the Common Crawl dataset, if possible, to see where the real-world may conflict with this.

Note that, by intention, a 1.0 processor seeing `@version: 1.1` will fail, as it otherwise would ignore things specified in the context that would generate different results. If we considered errata to 1.0 that more critically evaluate the possible keys in a term definition and values of `@container`, we could potentially get rid of `@version`, but those processors would still need to fail.

> * Complexity / Value
> As with any transformation or mapping language (comparisons were drawn with GRDDL [2] and XSLT [3]), there is a trade off between complexity and expressivity.  As we add more features to the @context mapping language, we add the ability to express further use cases, but at the cost of fewer complete implementations.  This can be managed in several ways, including structured documentation and explicitly calling out what systems MUST or SHOULD implement and the consequences of not doing so, however better yet is to not add the complexity in the first place.  A systemic approach to evaluating the costs and advantages of adding features should be adopted for the working group. Each feature should have stronger use cases for when it is important to use, rather than just examples of its use.

Of course, no frivolous features; of course, that is in the eye of the beholder, and everything that was added served the needs of an important constituency.

As with other specs, if implementations do not implement a requirement that it’s user base requires, the problem becomes self-correcting.

SHOULDs are difficult/impossible to test for; I usually interpret as MUST if possible. I think that some of the ordering requirements in the API algorithms could be considered SHOULD, as it can have a performance impact.

That said, some of the `@graph` container use cases might have a narrow base.

> * Context Best Practices in a Distributed Ecosystem
> Regardless of the functionality available, how _should_ contexts be deployed that take into account the wide variety of use cases and environments? This needs to encompass the possibility that browsers would process JSON-LD embedded in HTML to make additional functionality available for the use. Consider the gmail (and now in Outlook) email actions [4] ... but encountered in pages by the browser.  If contexts are to matter beyond a flag for further processing, there needs to be a method for not performing a massive denial of service attack on an unsuspecting server as every web browser suddenly starts pinging it for its context. 

I think we might strengthen context dereferencing further with some SHOULD language, to at least require that processors perform persistent caching of contexts based on HTTP cache-control, and idempotent URI schemes.

> As we move towards a new Working Group, these sorts of systemic issues should be taken into account as well as the details of the algorithms and context/framing description languages.

+1

Gregg

> Thoughts on these sorts of issues would be appreciated!
> 
> Rob
> 
> -- 
> Rob Sanderson
> Semantic Architect
> The Getty Trust
> Los Angeles, CA 90049
Received on Wednesday, 25 April 2018 17:27:50 UTC