W3C home > Mailing lists > Public > public-json-ld-wg@w3.org > April 2020

Re: HTML Content Algorithms dont' take external JSON-LD data into account

From: Hoekstra, Rinke (ELS-AMS) <r.hoekstra@elsevier.com>
Date: Wed, 22 Apr 2020 11:34:41 +0000
To: Pierre-Antoine Champin <pchampin@liris.cnrs.fr>
CC: "public-json-ld-wg@w3.org" <public-json-ld-wg@w3.org>, "Breebaart, Matthijs (ELS-AMS)" <m.breebaart@elsevier.com>, "Townsend, Andrew S. (ELS)" <a.townsend@elsevier.com>
Message-ID: <BYAPR08MB5767A401387F76E17A5F29E5E3D20@BYAPR08MB5767.namprd08.prod.outlook.com>
Hi Pierre,

Perhaps I haven't explained clearly.

The current JSON-LD spec caters for two use cases (non-exclusive)

  1.  Providing a JSON-LD data alternate representation of an HTML document through e.g. content-negotiation. Here there's an implied 1:1 mapping between the HTML document and the JSON-LD in that they are both representations of the same thing. This is very much in line with the Linked Data principles.
  2.  Extracting JSON-LD content from one or more script elements contained in an HTML document. Here the HTML document contains information expressed as JSON-LD that may be the alternate representation (at 1), or it may be additional information about the document that cannot be expressed in HTML (but is still needed for proper interpretation, e.g. typical metadata such as authorship, creation date etc.), or it may be additional information of a different nature or about other things (e.g. resources mentioned in the HTML, provenance information associated with the file).

The use case that we're looking for is an extension of 2:

What if the "extra" JSON-LD that we want to associate with the document in 2) is maintained independently from the document, or is too large/unwieldy to store as part of the HTML document (e.g. provenance). In that case we can of course use the JSON-LD data enclosed in the HTML to link out to that external information. For instance by a) using link traversal (dereferencing outgoing links from the JSON-LD) of b) "borrowing" an import mechanism from another standard (such as OWL) or c) using the JSON-LD @import keyword.

All three have individual drawbacks: for a) there is no "end" to it (how many steps do we need to traverse to obtain the full graph of information that was meant to "belong" to the document) and for b) this is meant for schema-schema imports and not data-data imports, and for c) this is only for importing contexts, not actual data.

All three strategies therefore do not allow us to augment documents with information in a distributed fashion while still maintaining some notion of document integrity (what belongs to the document and what does not). It feels natural to allow the JSON-LD + HTML combination to cater for each of these situations: having embedded JSON-LD, and having external JSON-LD that still pertains to the document, and both. Really very much in the same way that you can embed JavaScript in an HTML document, or link to an external JS file.

I acknowledge that perhaps it's a bit "the wrong way 'round" and is rather better suited as part of the HTML spec. On the other hand, with the attention spent in the JSON-LD spec on processing JSON-LD content as part of a HTML document, it feels like an omission that could be easily fixed.

We'll be adopting the link strategy that Benjamin suggested using the "describedby" rel type. Though it also doesn't make it clear that this is an "import"-like function, it suits our purposes for now.

Best,

Rinke

---
Dr. Rinke Hoekstra
Lead Architect - Knowledge
Elsevier​, Amsterdam
r.hoekstra@elsevier.com

________________________________
From: Pierre-Antoine Champin <pchampin@liris.cnrs.fr>
Sent: 22 April 2020 12:08
To: Hoekstra, Rinke (ELS-AMS) <r.hoekstra@elsevier.com>
Cc: public-json-ld-wg@w3.org <public-json-ld-wg@w3.org>; Breebaart, Matthijs (ELS-AMS) <m.breebaart@elsevier.com>; Townsend, Andrew S. (ELS) <a.townsend@elsevier.com>
Subject: Re: HTML Content Algorithms dont' take external JSON-LD data into account


*** External email: use caution ***



Dear Rinke,

if the JSON-LD document is available at its own IRI, you can pass this IRI directly to your document loader.
Why would you want to pass the IRI of the HTML document?
I fail to see your use case...

 best

On Tue, 21 Apr 2020 at 16:35, Hoekstra, Rinke (ELS-AMS) <r.hoekstra@elsevier.com<mailto:r.hoekstra@elsevier.com>> wrote:
Hi All,

We stumbled upon something odd when going through the HTML Content Algorithms (section 9.5 of the JSON LD 1.1 API document, [1]).

The algorithm extracts the JSON-LD from the textContent of script elements with a JSON-LD mime type as value for the "type" attribute.

We have cases where, similar to e.g. JavaScript, our HTML documents refer to JSON-LD data that is hosted external to the HTML document itself.

Our current approach is to use an empty script element with "type" set to the JSON-LD mime type, and "src" set to the dereferenceable IRI of the JSON-LD dataset that we want to process.

Our assumption was that JSON-LD processing of HTML documents would automatically consume these external datasets, but the current algorithm doesn't allow for this. That is, if we indeed read the specs correctly.

I appreciate that it's a bit late in the game, but it would be good to at least have the algorithm state explicitly that loading such external JSON-LD data using a "src" attribute is OPTIONAL. We'd rather not standardise on this internally when the JSON-LD spec may opt for using e.g. link elements at a later stage.

Thanks,

Rinke


[1] https://www.w3.org/TR/json-ld11-api/#html-content-algorithms<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2FTR%2Fjson-ld11-api%2F%23html-content-algorithms&data=02%7C01%7Cr.hoekstra%40elsevier.com%7C7abc1df3d8364e82ea9e08d7e6a52609%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637231470488618374&sdata=U%2FJEDxZeH52dco27XhAkZwTZ8mbvWcI8%2Fi2zWzqn%2FRM%3D&reserved=0>

---
Dr. Rinke Hoekstra
Lead Architect - Knowledge
Elsevier​, Amsterdam
r.hoekstra@elsevier.com<mailto:r.hoekstra@elsevier.com>

________________________________

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.

________________________________

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
Received on Wednesday, 22 April 2020 11:34:57 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 April 2020 11:34:58 UTC