Re: Trailing content in JSON-LD from Andy Seaborne on 2015-08-23 (public-linked-json@w3.org from August 2015)

From: Andy Seaborne <andy@seaborne.org>
Date: Sun, 23 Aug 2015 19:00:18 +0100
To: Gregg Kellogg <gregg@greggkellogg.net>
CC: Linked JSON <public-linked-json@w3.org>
Message-ID: <55DA0A32.9000008@seaborne.org>

On 23/08/15 18:34, Gregg Kellogg wrote:

> However, as a practical matter, JSON may be included in and HTML script tag, which could conceivably be in CDATA. Sometimes other non-JSON comment (such as a // comment) is also found). Because these are seen in the wild, my reader removes everything preceding “{“ or “[“ and everything trailing “}” or “]” to look for a valid JSON document. The specific substitution pattern I use is the following:
>
> input.to_s.sub(%r(\A[^{\[]*)m, '').sub(%r([^}\]]*\Z)m, ‘')
>
> While this is technically invalid IMO, practically speaking not eating such garbage will break real-world usage (perhaps mostly in schema.org examples). I could see generating an error if this is seen when validating, but otherwise I’m inclined to eat such garbage in my implementation.

Agreed, embedding is important.

I think it's better to talk about an extraction step to identify the 
content before invoking the JSON(-LD) specs.  There may be escaping or 
encoding issues from the enclosing content.  I see chopping junk as an 
example of such a step.

 Andy

Received on Sunday, 23 August 2015 18:00:48 UTC