- From: Wes Turner <wes.turner@gmail.com>
- Date: Tue, 7 Jan 2014 10:17:22 -0600
- To: Dan Brickley <danbri@danbri.org>
- Cc: Dan Brickley <danbri@google.com>, Markus Lanthaler <markus.lanthaler@gmx.net>, Linked JSON <public-linked-json@w3.org>, Ramanathan Guha <guha@google.com>, Gregg Kellogg <gregg@greggkellogg.net>, Sandro Hawke <sandro@hawke.org>, W3C Web Schemas Task Force <public-vocabs@w3.org>
- Message-ID: <CACfEFw_wd9VeuLz5JbUaS=KRGz0BdXfPLnjF-zyvHFkDibXTFg@mail.gmail.com>
Wes Turner On Jan 7, 2014 6:53 AM, "Dan Brickley" <danbri@danbri.org> wrote: > > On 7 January 2014 10:16, Markus Lanthaler <markus.lanthaler@gmx.net> wrote: > > >> W3C's experience with XML parsers that auto-fetch > >> http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd and > >> http://www.w3.org/1999/xhtml when parsing XML is relevant here: > > [...] > >> > >> If JSON is the new XML and JSON-LD is the emerging best practice for > >> interoperable JSON, it isn't unreasonable to expect XML-levels of > >> usage. So let's try to learn from the W3C XML DTD experience. > > > > I think there's a very important difference to that experience. XML namespaces are not links and are thus not *expected* to be dereferenced. Thus, AFAICT, for a long time those URLs returned non-cacheable HTTP error responses. If you know that a document is going to be requested often, you can plan for it (CDN, long cache validity etc.). I know it's important to keep these things in mind but I'm still not convinced that serving a small static file (even if it is requested millions of times) causes much costs. Otherwise, all the free JavaScript library CDNs etc. would have been shut down already a long time ago.. Last time I tried to run a free CDN, it wasn't inexpensive. Can we create a validated whitelist of schema URIs, or should we rely upon sensible server-side caching (etags, cache-control)? The Norvig XKCD solution may be helpful here. > > The main lesson from > http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/ > is DTD-related rather than schema-related. Schema-fetching is > generally seen as more optional. So, <link> elements with full URL/URIs or (@vocab and/or prefixes which a 'normal' client won't prefetch or unnecessarily dereference)? > > The important difference is (I haven't found the exact spec reference > but...) the XML 1.0 spec says that when parsing XML with validation > enabled, external references to DTDs must be de-referenced. A separate thread is probably more appropriate for a question like "which libraries / frameworks / toolkits / user-agents are needlessly requesting unnecessary levels of server resources?" > > This is something that anyone learning-through-doing XML handling > might not even think about, if they're using a spec-compliant XML > parser. Many users of such libraries have no idea that their > application code is hammering w3.org with repeated HTTP requests. > Let's think about how we can help novice JSON-LD toolkit users find > themselves in the same position. > Perhaps the default behaviour of a > JSON-LD toolkit / parser could keep a global fetches-per-minute count, > and complain to STDERR if the application is over-fetching? (alongside > sensible caching defaults etc) IIRC the reddit PRAW API enforces client-side rate-limiting with the requests library, but requests_cache seems not to work.
Received on Tuesday, 7 January 2014 16:17:52 UTC