Re: Whither the schema.org JSON-LD context? from Dan Brickley on 2014-01-07 (public-vocabs@w3.org from January 2014)

From: Dan Brickley <danbri@danbri.org>
Date: Tue, 7 Jan 2014 12:51:14 +0000
To: Markus Lanthaler <markus.lanthaler@gmx.net>
Cc: Sandro Hawke <sandro@hawke.org>, Gregg Kellogg <gregg@greggkellogg.net>, Dan Brickley <danbri@google.com>, Ramanathan Guha <guha@google.com>, W3C Web Schemas Task Force <public-vocabs@w3.org>, Linked JSON <public-linked-json@w3.org>
Message-ID: <CAFfrAFp7aht92qu2r6hEH5zaBfvr_NSbvjRvSbs91hgCXA8j4w@mail.gmail.com>

On 7 January 2014 10:16, Markus Lanthaler <markus.lanthaler@gmx.net> wrote:

>> W3C's experience with XML parsers that auto-fetch
>> http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd and
>> http://www.w3.org/1999/xhtml when parsing XML is relevant here:
> [...]
>>
>> If JSON is the new XML and JSON-LD is the emerging best practice for
>> interoperable JSON, it isn't unreasonable to expect XML-levels of
>> usage. So let's try to learn from the W3C XML DTD experience.
>
> I think there's a very important difference to that experience. XML namespaces are not links and are thus not *expected* to be dereferenced. Thus, AFAICT, for a long time those URLs returned non-cacheable HTTP error responses. If you know that a document is going to be requested often, you can plan for it (CDN, long cache validity etc.). I know it's important to keep these things in mind but I'm still not convinced that serving a small static file (even if it is requested millions of times) causes much costs. Otherwise, all the free JavaScript library CDNs etc. would have been shut down already a long time ago..

The main lesson from
http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/
is DTD-related rather than schema-related. Schema-fetching is
generally seen as more optional.

The important difference is (I haven't found the exact spec reference
but...) the XML 1.0 spec says that when parsing XML with validation
enabled, external references to DTDs must be de-referenced.

This is something that anyone learning-through-doing XML handling
might not even think about, if they're using a spec-compliant XML
parser. Many users of such libraries have no idea that their
application code is hammering w3.org with repeated HTTP requests.
Let's think about how we can help novice JSON-LD toolkit users find
themselves in the same position. Perhaps the default behaviour of a
JSON-LD toolkit / parser could keep a global fetches-per-minute count,
and complain to STDERR if the application is over-fetching? (alongside
sensible caching defaults etc)

Dan

Received on Tuesday, 7 January 2014 12:51:45 UTC