Re: Whither the schema.org JSON-LD context? from Gregg Kellogg on 2014-01-06 (public-vocabs@w3.org from January 2014)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Mon, 6 Jan 2014 12:24:21 -0800
To: Dan Brickley <danbri@google.com>
Cc: Ramanathan Guha <guha@google.com>, W3C Web Schemas Task Force <public-vocabs@w3.org>, Linked JSON <public-linked-json@w3.org>
Message-Id: <E3C30A1E-3478-4966-AF71-1508D10DF19F@greggkellogg.net>

On Jan 6, 2014, at 11:56 AM, Dan Brickley <danbri@google.com> wrote:

> +Cc: Guha
> 
> On 6 January 2014 18:48, Gregg Kellogg <gregg@greggkellogg.net> wrote:
>> For some time, we've been expecting schema.org to publish a json-ld context at http://schema.org/ via content-negotation when the request is made with an accept header including application/ld+json. On behalf of the Linked JSON Community Group, I'd like to get an update on this.
>> 
>> To get around this, many (most) JSON-LD tool suppliers have provided their own context based on the schema.org vocabulary definition, but this is prone to error, and difference of implementation between the various tools. I understand that there could be some concern about excessive requests for the context, when it's not necessary, however, it's hard to see that this would even approach the number of requests for http://schema.org/ itself, from tools that encounter that in HTML.
>> 
>> Any timeline on when this might be available?
> 
> I think it's reasonable to expect a static file published this
> quarter. However you're right that we do have concerns about the
> schema.org *website* forming an integral part of numerous unknown
> software systems and applications. It ought to be possible to do
> useful things with schema.org-based json-ld without a dependency on
> the Web site.
> 
> W3C's experience with XML parsers that auto-fetch
> http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd and
> http://www.w3.org/1999/xhtml when parsing XML is relevant here:
> 
> Excerpting from
> http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/
> 
> "Handling all these requests costs us considerably: servers,
> bandwidth and human time spent analyzing traffic patterns and devising
> methods to limit or block excessive new request patterns."
> 
> If someone has millions of schema.org-based JSON-LD documents that
> they want to parse into RDF or otherwise consume via json-ld tooling,
> are there code snippets and examples for the popular toolkits that
> make it likely the schema.org will see one request (per session, day,
> application invocation etc.) rather than millions?

I think that this is reasonable; we can discuss it on the next JSON-LD call. Using HTTP headers that allow caching and allow a client to wait 24 hours before checking back using last-modified or ETag would do this. On your part, if your terms of use restrict overuse of the service, returning something like a status 429 (Too Many Requests) would allow you to black-list sites that are abusing the system and create push-back on vendors to adhere to the terms of use and caching policy. A request to http://schema.org/ using application/ld+json would then return something like the following HTTP headers:

Content-Type: application/ld+json
Last-Modified: ...
ETag: ...
Cache-Control: public, max-age=86400
Vary: Content-Type

Some allowance should be made for production vs testing environments, so avoiding returning a 429 should probably be avoided unless a truely excessive number of requests is detected, or through some webmaster intervention.

Gregg

> If JSON is the new XML and JSON-LD is the emerging best practice for
> interoperable JSON, it isn't unreasonable to expect XML-levels of
> usage. So let's try to learn from the W3C XML DTD experience.
> 
> Dan

Received on Monday, 6 January 2014 20:24:52 UTC