W3C home > Mailing lists > Public > public-json-ld-wg@w3.org > April 2020

Re: HTML Content Algorithms dont' take external JSON-LD data into account

From: Benjamin Young <byoung@bigbluehat.com>
Date: Tue, 21 Apr 2020 15:01:33 +0000
To: "Hoekstra, Rinke (ELS-AMS)" <r.hoekstra@elsevier.com>, "public-json-ld-wg@w3.org" <public-json-ld-wg@w3.org>
CC: "Breebaart, Matthijs (ELS-AMS)" <m.breebaart@elsevier.com>, "Townsend, Andrew S. (ELS)" <a.townsend@elsevier.com>
Message-ID: <BN7PR06MB51546C07FEF8645E9B29AE98B2D50@BN7PR06MB5154.namprd06.prod.outlook.com>
Thanks for reaching out, Rinke.

The JSON-LD in HTML stuff is based on "data block" script elements (though it seems like the "data block" phrase fell out of the JSON-LD spec content at some point...).

Data blocks are defined in the HTML spec as a very limited, un-processed script element for which no other attributes have an effect (from an HTML processing level at least):
> Setting the attribute to any other value means that the script is a data block, which is not processed. None of the script<https://html.spec.whatwg.org/multipage/scripting.html#the-script-element> attributes (except type<https://html.spec.whatwg.org/multipage/scripting.html#attr-script-type> itself) have any effect on data blocks. Authors must use a valid MIME type string<https://mimesniff.spec.whatwg.org/#valid-mime-type> that is not a JavaScript MIME type essence match<https://mimesniff.spec.whatwg.org/#javascript-mime-type-essence-match> to denote data blocks.

Sadly, browsers do not currently encourage the use of "remote" data blocks--which is essentially what you've described: `<script type="..." src="..."></script>`

In the meantime, JSON-LD already recommends the use of an HTTP Link header using `rel="alternate"` for discovering JSON-LD variant for the current resource request. So, the same system can be applied to a link element with the same conceptual result: `<link rel="alternate" type="application/ld+json" href="..." />`

However, you may also want to relate other JSON-LD to your page that isn't a 1-to-1 alternate representation for the current resource, so you'd want to express those with other rel values:

Would using the link element approach work for your use case?

There's certainly more to explore in this area. 🙂





From: Hoekstra, Rinke (ELS-AMS) <r.hoekstra@elsevier.com>
Sent: Tuesday, April 21, 2020 8:28 AM
To: public-json-ld-wg@w3.org <public-json-ld-wg@w3.org>
Cc: Breebaart, Matthijs (ELS-AMS) <m.breebaart@elsevier.com>; Townsend, Andrew S. (ELS) <a.townsend@elsevier.com>
Subject: HTML Content Algorithms dont' take external JSON-LD data into account

Hi All,

We stumbled upon something odd when going through the HTML Content Algorithms (section 9.5 of the JSON LD 1.1 API document, [1]).

The algorithm extracts the JSON-LD from the textContent of script elements with a JSON-LD mime type as value for the "type" attribute.

We have cases where, similar to e.g. JavaScript, our HTML documents refer to JSON-LD data that is hosted external to the HTML document itself.

Our current approach is to use an empty script element with "type" set to the JSON-LD mime type, and "src" set to the dereferenceable IRI of the JSON-LD dataset that we want to process.

Our assumption was that JSON-LD processing of HTML documents would automatically consume these external datasets, but the current algorithm doesn't allow for this. That is, if we indeed read the specs correctly.

I appreciate that it's a bit late in the game, but it would be good to at least have the algorithm state explicitly that loading such external JSON-LD data using a "src" attribute is OPTIONAL. We'd rather not standardise on this internally when the JSON-LD spec may opt for using e.g. link elements at a later stage.



[1] https://www.w3.org/TR/json-ld11-api/#html-content-algorithms

Dr. Rinke Hoekstra
Lead Architect - Knowledge
Elsevier​, Amsterdam


Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
Received on Tuesday, 21 April 2020 15:01:51 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 21 April 2020 15:01:52 UTC