RE: Whither the schema.org JSON-LD context? from Markus Lanthaler on 2014-01-07 (public-linked-json@w3.org from January 2014)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Tue, 7 Jan 2014 11:16:44 +0100
To: "'Dan Brickley'" <danbri@danbri.org>, "'Sandro Hawke'" <sandro@hawke.org>
Cc: "'Gregg Kellogg'" <gregg@greggkellogg.net>, "'Dan Brickley'" <danbri@google.com>, "'Ramanathan Guha'" <guha@google.com>, "'W3C Web Schemas Task Force'" <public-vocabs@w3.org>, "'Linked JSON'" <public-linked-json@w3.org>
Message-ID: <000001cf0b91$92920400$b7b60c00$@lanthaler@gmx.net>

On Monday, January 06, 2014 8:56 PM, Dan Brickley wrote:
> I think it's reasonable to expect a static file published this
> quarter.

Great!

> However you're right that we do have concerns about the
> schema.org *website* forming an integral part of numerous unknown
> software systems and applications. It ought to be possible to do
> useful things with schema.org-based json-ld without a dependency on
> the Web site.

Sure.. and as you know it's quite simple. All you would have to do is to change the example to start with

{
  "@context": {
    "@vocab": "http://schema.org/"
  },
  ...
}

instead of the slightly simpler

{
  "@context": "http://schema.org/",
  ...
}

> W3C's experience with XML parsers that auto-fetch
> http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd and
> http://www.w3.org/1999/xhtml when parsing XML is relevant here:
[...]
> 
> If JSON is the new XML and JSON-LD is the emerging best practice for
> interoperable JSON, it isn't unreasonable to expect XML-levels of
> usage. So let's try to learn from the W3C XML DTD experience.

I think there's a very important difference to that experience. XML namespaces are not links and are thus not *expected* to be dereferenced. Thus, AFAICT, for a long time those URLs returned non-cacheable HTTP error responses. If you know that a document is going to be requested often, you can plan for it (CDN, long cache validity etc.). I know it's important to keep these things in mind but I'm still not convinced that serving a small static file (even if it is requested millions of times) causes much costs. Otherwise, all the free JavaScript library CDNs etc. would have been shut down already a long time ago..

On Tuesday, January 07, 2014 9:14 AM, Dan Brickley wrote:
> On 7 January 2014 02:03, Sandro Hawke <sandro@hawke.org> wrote:
> > There's a kind of natural feedback loop here that if schema.org starts
> > to get overloaded and slow, clients will have more motivation to cache.
> > Perhaps that's the solution to the many-people-on-one-IP-address; rather
> > than giving a 429, just de-prioritize or temporarily tar-pit folks
> > asking too fast.   It would sure be nice if there was a way to give an
> > error message, or at least know who to contact.   I bet user-agent
> > fields are not set very well in general....
> 
> I'm going to ignore the remarks about controlling data on the Web, and
> focus on the fact that this sounds like a giant science experiment.
> 
> How about if content-negotiated requests for the json-ld version of
> schema.org's homepage had a 60 second (or so) pause built-in?

For the first request or for subsequent requests? I think it's a great idea to 

> encourage better use of caching and
> avoidance of fresh fetches within tight code loops.

but a very bad idea if everyone has to pay that price.

> BTW is a redirect URL a legitimate response to such a request, or does
> the JSON have to be returned directly?

It is but considerably increases latency and should thus be avoided. Again, changing the examples on schema.org and other places to use, e.g. http://schema.org/context would solve the conneg problem. Summarized, I think there are enough options. We just have to choose one and execute it as soon as possible. The longer we wait, the more difficult it becomes.

Cheers,
Markus

--
Markus Lanthaler
@markuslanthaler

Received on Tuesday, 7 January 2014 10:17:12 UTC