Re: Whither the schema.org JSON-LD context?

Wes Turner
On Jan 7, 2014 6:53 AM, "Dan Brickley" <danbri@danbri.org> wrote:
>
> On 7 January 2014 10:16, Markus Lanthaler <markus.lanthaler@gmx.net>
wrote:
>
> >> W3C's experience with XML parsers that auto-fetch
> >> http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd and
> >> http://www.w3.org/1999/xhtml when parsing XML is relevant here:
> > [...]
> >>
> >> If JSON is the new XML and JSON-LD is the emerging best practice for
> >> interoperable JSON, it isn't unreasonable to expect XML-levels of
> >> usage. So let's try to learn from the W3C XML DTD experience.
> >
> > I think there's a very important difference to that experience. XML
namespaces are not links and are thus not *expected* to be dereferenced.
Thus, AFAICT, for a long time those URLs returned non-cacheable HTTP error
responses. If you know that a document is going to be requested often, you
can plan for it (CDN, long cache validity etc.). I know it's important to
keep these things in mind but I'm still not convinced that serving a small
static file (even if it is requested millions of times) causes much costs.
Otherwise, all the free JavaScript library CDNs etc. would have been shut
down already a long time ago..

Last time I tried to run a free CDN, it wasn't inexpensive.

Can we create a validated whitelist of schema URIs, or should we rely upon
sensible server-side caching (etags, cache-control)?

The Norvig XKCD solution may be helpful here.

>
> The main lesson from
> http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/
> is DTD-related rather than schema-related. Schema-fetching is
> generally seen as more optional.

So, <link> elements with full URL/URIs or (@vocab and/or prefixes which a
'normal' client won't prefetch or unnecessarily dereference)?

>
> The important difference is (I haven't found the exact spec reference
> but...) the XML 1.0 spec says that when parsing XML with validation
> enabled, external references to DTDs must be de-referenced.

A separate thread is probably more appropriate for a question like "which
libraries / frameworks / toolkits / user-agents are needlessly requesting
unnecessary levels of server resources?"

>
> This is something that anyone learning-through-doing XML handling
> might not even think about, if they're using a spec-compliant XML
> parser. Many users of such libraries have no idea that their
> application code is hammering w3.org with repeated HTTP requests.
> Let's think about how we can help novice JSON-LD toolkit users find
> themselves in the same position.

> Perhaps the default behaviour of a
> JSON-LD toolkit / parser could keep a global fetches-per-minute count,
> and complain to STDERR if the application is over-fetching? (alongside
> sensible caching defaults etc)

IIRC the reddit PRAW API enforces client-side rate-limiting with the
requests library, but requests_cache seems not to work.

Received on Tuesday, 7 January 2014 16:17:52 UTC