Re: Trailing content in JSON-LD

As someone writing software to make requests for JSON-LD documents and
parse them, I am really not interested in trying to figure out how to parse
the input you provided. There could be thousands of different ways a
response could contain a fragment that is a valid JSON-LD document and
_other stuff_, and I don't want to support one case of _other stuff_,
because there's no end to the possibilities of what else I might find. If I
request 'application/ld+json', I don't want anything other than the
document that I can send straight to my JSON parser.

Nate

On Sun, Aug 23, 2015 at 11:00 AM, Andy Seaborne <andy@seaborne.org> wrote:

>
>
> On 23/08/15 18:34, Gregg Kellogg wrote:
>
> However, as a practical matter, JSON may be included in and HTML script
>> tag, which could conceivably be in CDATA. Sometimes other non-JSON comment
>> (such as a // comment) is also found). Because these are seen in the wild,
>> my reader removes everything preceding “{“ or “[“ and everything trailing
>> “}” or “]” to look for a valid JSON document. The specific substitution
>> pattern I use is the following:
>>
>> input.to_s.sub(%r(\A[^{\[]*)m, '').sub(%r([^}\]]*\Z)m, ‘')
>>
>> While this is technically invalid IMO, practically speaking not eating
>> such garbage will break real-world usage (perhaps mostly in schema.org
>> examples). I could see generating an error if this is seen when validating,
>> but otherwise I’m inclined to eat such garbage in my implementation.
>>
>
> Agreed, embedding is important.
>
> I think it's better to talk about an extraction step to identify the
> content before invoking the JSON(-LD) specs.  There may be escaping or
> encoding issues from the enclosing content.  I see chopping junk as an
> example of such a step.
>
>         Andy
>
>

Received on Sunday, 23 August 2015 18:28:42 UTC