- From: Andy Seaborne <andy@seaborne.org>
- Date: Sun, 23 Aug 2015 19:00:18 +0100
- To: Gregg Kellogg <gregg@greggkellogg.net>
- CC: Linked JSON <public-linked-json@w3.org>
On 23/08/15 18:34, Gregg Kellogg wrote: > However, as a practical matter, JSON may be included in and HTML script tag, which could conceivably be in CDATA. Sometimes other non-JSON comment (such as a // comment) is also found). Because these are seen in the wild, my reader removes everything preceding “{“ or “[“ and everything trailing “}” or “]” to look for a valid JSON document. The specific substitution pattern I use is the following: > > input.to_s.sub(%r(\A[^{\[]*)m, '').sub(%r([^}\]]*\Z)m, ‘') > > While this is technically invalid IMO, practically speaking not eating such garbage will break real-world usage (perhaps mostly in schema.org examples). I could see generating an error if this is seen when validating, but otherwise I’m inclined to eat such garbage in my implementation. Agreed, embedding is important. I think it's better to talk about an extraction step to identify the content before invoking the JSON(-LD) specs. There may be escaping or encoding issues from the enclosing content. I see chopping junk as an example of such a step. Andy
Received on Sunday, 23 August 2015 18:00:48 UTC