Re: Permit external JSON-LD files? from Gregg Kellogg on 2022-08-31 (public-schemaorg@w3.org from August 2022)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Wed, 31 Aug 2022 14:20:41 -0700
To: Dan Brickley <danbri@google.com>
Cc: Martin Bean <martin@martinbean.co.uk>, Jarno van Driel <jarnovandriel@gmail.com>, Gregory Saumier-Finch <gregory@culturecreates.com>, "schema.org Mailing List" <public-schemaorg@w3.org>, Dave Vieglais <dave.vieglais@gmail.com>
Message-Id: <BAD9DF2C-F6FB-4870-B8DE-3F1009BA7314@greggkellogg.net>

> On Aug 31, 2022, at 6:13 AM, Dan Brickley <danbri@google.com> wrote:
> 
> On Wed, 31 Aug 2022 at 13:52, Martin Bean <martin@martinbean.co.uk <mailto:martin@martinbean.co.uk>> wrote:
> > As for the <link rel="alternate"> html element, I've asked Gregg kellogg about this in the past and he indicated this isn't supported by design as it would force parsers to be able to parse html.
> 
> Does it not have to do that already to find a <script> tag containing JSON-LD data, or HTML marked up with Schema.org properties?
> 
> Minimally, yes. When we (Schema.org people) proposed that JSON-LD Working Group define a way <https://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents> of putting JSON-LD inside of HTML pages, this was an explicit concern that they (including Gregg) raised. JSON(-LD) is relatively straightforward to parse, whereas HTML in all variants is a bit of a nightmare. They didn't want to bloat out all JSON-LD implementations with the burden of being able to full parse HTML (including DOM, JS etc.). That work was also not particularly aligned with browser / webplatform conversations, with lots of server-side data-to-data usecases driving the work.
> 
> I believe where we ended up was that https://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents <https://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents> (or its predecessor in JSON-LD 1.0, rather) would do something very constrained so that JSON-LD could be extracted from HTML without running it through a fully compliant HTML parser. For example I think that affected conversations around determining base URLs.
> 
> I expect there are more details in https://github.com/ruby-rdf/json-ld/tree/develop/lib/json/ld <https://github.com/ruby-rdf/json-ld/tree/develop/lib/json/ld> of how a JSON-LD tool handles the HTML part. 

Yes, it’s a question of separation of concerns. JSON-LD describes how JSON-LD might be contained in a script element (similar to how other formats, such as Turtle, have described it). Once it’s extracted, then it can simply be processed as JSON-LD, with provisions for identifying the base IRI and potentially referencing JSON-LD in other script blocks. This is optional, and it is expected that systems already processing HTML might want to have a way of locating these JSON-LD script islands and processing this as defined in the JSON-LD specs, but not that a native JSON-LD implementation would need to know how to process HTML.

JSON-LD, along with other RDF, describes how link tags can be used to locate separate encodings that describe the same resource; this goes back to the early days of the semantic web where it was thought that web pages might be described in parallel through separately encoded files (as Dan remembers, I’m sure). It’s a design pattern that’s not used much, but it is a common description. Extending to HTML link tags would be natural for systems that want to do that.

IMHO, the use of rel=“alternate” is probably superior to inventing a schema like a “.meta” file, and is based in standards, and provides for various alternative formats of such metadata, not just JSON-LD.

Note that there have been proposals for CBOR-LD [1], which would be a much smaller encoding than JSON-LD, although nothing’s reached standardization at this point. There’s also active work on YAML-LD [2], which can be somewhat more compcat, and certainly more humanly readable. Both leverage JSON-LD in a large way.

If download size is a consideration, then binary formats have several advantages. Effective caching of such resources is pretty important, too. In particular, the schema.org <http://schema.org/> JSON-LD context file.

Gregg

[1] https://digitalbazaar.github.io/cbor-ld-spec/ <https://digitalbazaar.github.io/cbor-ld-spec/>
[2] https://json-ld.github.io/yaml-ld/spec/ <https://json-ld.github.io/yaml-ld/spec/> 

> Dan 
>  
> 
> > On 31 Aug 2022, at 13:38, Jarno van Driel <jarnovandriel@gmail.com <mailto:jarnovandriel@gmail.com>> wrote:
> > 
> > As for the <link rel="alternate"> html element, I've asked Gregg kellogg about this in the past and he indicated this isn't supported by design as it would force parsers to be able to parse html.
>

Received on Wednesday, 31 August 2022 21:20:57 UTC