- From: Gregg Kellogg <gregg@greggkellogg.net>
- Date: Sun, 23 Aug 2015 10:34:06 -0700
- To: Andy Seaborne <andy@seaborne.org>
- Cc: Linked JSON <public-linked-json@w3.org>
> On Aug 23, 2015, at 3:15 AM, Andy Seaborne <andy@seaborne.org> wrote: > > I'm having trouble pinning down what the spec status is of this input (this is for an issue in jsonld-java). > > Does the trailing content mean it is illegal JSON-LD or not or is it outside the spec altogether in some cases? > > ---------------------- > { > "@id" : "http://example/s", > "http://example/p" : "str" > } > xxxxxxxxx > ---------------------- > > The question is whether the whole input is the "JSON Document" or whether the trailing junk is considered to be outside the JSON Document. > > In the first case, it is a parse error, and any output is undefined. > In the second case, there would be triples and no parse error. > > I currently think that the spec says this is illegal JSON-LD but the argument is convoluted and relies on the input coming from HTTP. If it were some other source (a file with a non jsonld extension [tut, tut]), it is unstated. > > The spec chase: > > Section 8 => > > """ > A JSON-LD document MUST be a valid JSON document as described in [RFC4627]. > > A JSON-LD document MUST be a single node object or an array whose elements are each node objects at the top level. > “"" Adding “exclusive of enclosing whitespace". > RFC4627 is the media type registration for JSON. > > The definition link for "JSON-LD document" is descriptive: > """ > A JSON-LD document serializes a generalized RDF Dataset [RDF11-CONCEPTS], which is a collection of graphs that comprises exactly one default graph and zero or more named graphs. > """ > > so it does not say, to my reading, that the "JSON-LD document" includes or excludes the content after the "}”. I believe a JSON-LD document is the entire document, and in my case, anyway, the entire document is passed to the Ruby JSON parser, where I expect to see an Array or Object. Non-whitespace trailing characters would be a syntax error. > RFC4627 talks about a "JSON text" when defining the media type. > Because that is the whole of the HTTP body, I think it means that "JSON text" includes everything. Then "MUST be a single node object" applies => it's a parse error. +1 > Proposed spec fix 1: > If it said that > """ > A JSON-LD document MUST be a valid JSON *text* as described in [RFC4627]. > """ > > then it would be clearer but still only applies if the media type can be invoked and sometimes it can't (e.g a stream of chars from a non-HTTP stream). Yes, I think this is clear. > A sentence in the grammar explicitly, making it a synatx isse, not a context issue, stating that no trailing content is permitted would cover all cases. With the provision that leading and trailing whitespace is permitted. However, as a practical matter, JSON may be included in and HTML script tag, which could conceivably be in CDATA. Sometimes other non-JSON comment (such as a // comment) is also found). Because these are seen in the wild, my reader removes everything preceding “{“ or “[“ and everything trailing “}” or “]” to look for a valid JSON document. The specific substitution pattern I use is the following: input.to_s.sub(%r(\A[^{\[]*)m, '').sub(%r([^}\]]*\Z)m, ‘') While this is technically invalid IMO, practically speaking not eating such garbage will break real-world usage (perhaps mostly in schema.org examples). I could see generating an error if this is seen when validating, but otherwise I’m inclined to eat such garbage in my implementation. Gregg > Andy >
Received on Sunday, 23 August 2015 17:34:36 UTC