- From: Gregg Kellogg <gregg@greggkellogg.net>
- Date: Sun, 23 Aug 2015 10:34:06 -0700
- To: Andy Seaborne <andy@seaborne.org>
- Cc: Linked JSON <public-linked-json@w3.org>
> On Aug 23, 2015, at 3:15 AM, Andy Seaborne <andy@seaborne.org> wrote:
>
> I'm having trouble pinning down what the spec status is of this input (this is for an issue in jsonld-java).
>
> Does the trailing content mean it is illegal JSON-LD or not or is it outside the spec altogether in some cases?
>
> ----------------------
> {
> "@id" : "http://example/s",
> "http://example/p" : "str"
> }
> xxxxxxxxx
> ----------------------
>
> The question is whether the whole input is the "JSON Document" or whether the trailing junk is considered to be outside the JSON Document.
>
> In the first case, it is a parse error, and any output is undefined.
> In the second case, there would be triples and no parse error.
>
> I currently think that the spec says this is illegal JSON-LD but the argument is convoluted and relies on the input coming from HTTP. If it were some other source (a file with a non jsonld extension [tut, tut]), it is unstated.
>
> The spec chase:
>
> Section 8 =>
>
> """
> A JSON-LD document MUST be a valid JSON document as described in [RFC4627].
>
> A JSON-LD document MUST be a single node object or an array whose elements are each node objects at the top level.
> “""
Adding “exclusive of enclosing whitespace".
> RFC4627 is the media type registration for JSON.
>
> The definition link for "JSON-LD document" is descriptive:
> """
> A JSON-LD document serializes a generalized RDF Dataset [RDF11-CONCEPTS], which is a collection of graphs that comprises exactly one default graph and zero or more named graphs.
> """
>
> so it does not say, to my reading, that the "JSON-LD document" includes or excludes the content after the "}”.
I believe a JSON-LD document is the entire document, and in my case, anyway, the entire document is passed to the Ruby JSON parser, where I expect to see an Array or Object. Non-whitespace trailing characters would be a syntax error.
> RFC4627 talks about a "JSON text" when defining the media type.
> Because that is the whole of the HTTP body, I think it means that "JSON text" includes everything. Then "MUST be a single node object" applies => it's a parse error.
+1
> Proposed spec fix 1:
> If it said that
> """
> A JSON-LD document MUST be a valid JSON *text* as described in [RFC4627].
> """
>
> then it would be clearer but still only applies if the media type can be invoked and sometimes it can't (e.g a stream of chars from a non-HTTP stream).
Yes, I think this is clear.
> A sentence in the grammar explicitly, making it a synatx isse, not a context issue, stating that no trailing content is permitted would cover all cases.
With the provision that leading and trailing whitespace is permitted.
However, as a practical matter, JSON may be included in and HTML script tag, which could conceivably be in CDATA. Sometimes other non-JSON comment (such as a // comment) is also found). Because these are seen in the wild, my reader removes everything preceding “{“ or “[“ and everything trailing “}” or “]” to look for a valid JSON document. The specific substitution pattern I use is the following:
input.to_s.sub(%r(\A[^{\[]*)m, '').sub(%r([^}\]]*\Z)m, ‘')
While this is technically invalid IMO, practically speaking not eating such garbage will break real-world usage (perhaps mostly in schema.org examples). I could see generating an error if this is seen when validating, but otherwise I’m inclined to eat such garbage in my implementation.
Gregg
> Andy
>
Received on Sunday, 23 August 2015 17:34:36 UTC