W3C home > Mailing lists > Public > public-json-ld-wg@w3.org > April 2020

Re: HTML Content Algorithms dont' take external JSON-LD data into account

From: Hoekstra, Rinke (ELS-AMS) <r.hoekstra@elsevier.com>
Date: Tue, 21 Apr 2020 15:52:55 +0000
To: Benjamin Young <byoung@bigbluehat.com>, "public-json-ld-wg@w3.org" <public-json-ld-wg@w3.org>
CC: "Breebaart, Matthijs (ELS-AMS)" <m.breebaart@elsevier.com>, "Townsend, Andrew S. (ELS)" <a.townsend@elsevier.com>
Message-ID: <BYAPR08MB57671AA2023F6F9B236CAFE3E3D50@BYAPR08MB5767.namprd08.prod.outlook.com>
Hi Benjamin,

Thank you for the quick and detailed response. Indeed we don't want to just serve an alternate JSON-LD version of the HTML document, but rather link out to other JSON-LD sources that together with the JSON-LD embedded in the document give the full "relevant" description of the source.

Given the restrictions in the specs on interpreting script elements, the link element seems to be the best option for our case.

Looking at the link types, I think the POWDER "describedby" seems the most apt (though "item" comes close). It would be great to have something close to the "imports" that JSON-LD uses for the contexts, *and* to have some kind of recommendation in the spec on linking out to external JSON-LD sources from HTML.

-Rinke

---
Dr. Rinke Hoekstra
Lead Architect - Knowledge
Elsevier​, Amsterdam
r.hoekstra@elsevier.com
________________________________
From: Benjamin Young <byoung@bigbluehat.com>
Sent: 21 April 2020 17:04
To: Hoekstra, Rinke (ELS-AMS) <r.hoekstra@elsevier.com>; public-json-ld-wg@w3.org <public-json-ld-wg@w3.org>
Cc: Breebaart, Matthijs (ELS-AMS) <m.breebaart@elsevier.com>; Townsend, Andrew S. (ELS) <a.townsend@elsevier.com>
Subject: Re: HTML Content Algorithms dont' take external JSON-LD data into account


*** External email: use caution ***



Also, it looks like we do use the "data block" phrasing in the Syntax document:
https://www.w3.org/TR/json-ld11/#embedding-json-ld-in-html-documents<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2FTR%2Fjson-ld11%2F%23embedding-json-ld-in-html-documents&data=02%7C01%7Cr.hoekstra%40elsevier.com%7Cb9af1e400f04424acd2d08d7e6055a18%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637230782989646314&sdata=GtisyGbmmJsx5VpmZ%2FH7weCudcXjBwY5ZtN8qj1B4YE%3D&reserved=0>

FWIW. 🙂


--

http://bigbluehat.com/<https://nam03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbigbluehat.com%2F&data=02%7C01%7Cr.hoekstra%40elsevier.com%7Cb9af1e400f04424acd2d08d7e6055a18%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637230782989656306&sdata=T1YQe5mNAipBL%2BiAvzivlwc5%2BIBXKNXEXRd8PwpLeWo%3D&reserved=0>

http://linkedin.com/in/benjaminyoung<https://nam03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flinkedin.com%2Fin%2Fbenjaminyoung&data=02%7C01%7Cr.hoekstra%40elsevier.com%7Cb9af1e400f04424acd2d08d7e6055a18%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637230782989656306&sdata=U9s5mwsxWjWV%2BywQCZGLAzdhJ%2FFO1oiv7iR%2B0cnqFFw%3D&reserved=0>

________________________________
From: Benjamin Young <byoung@bigbluehat.com>
Sent: Tuesday, April 21, 2020 11:01 AM
To: Hoekstra, Rinke (ELS-AMS) <r.hoekstra@elsevier.com>; public-json-ld-wg@w3.org <public-json-ld-wg@w3.org>
Cc: Breebaart, Matthijs (ELS-AMS) <m.breebaart@elsevier.com>; Townsend, Andrew S. (ELS) <a.townsend@elsevier.com>
Subject: Re: HTML Content Algorithms dont' take external JSON-LD data into account

Thanks for reaching out, Rinke.

The JSON-LD in HTML stuff is based on "data block" script elements (though it seems like the "data block" phrase fell out of the JSON-LD spec content at some point...).

Data blocks are defined in the HTML spec as a very limited, un-processed script element for which no other attributes have an effect (from an HTML processing level at least):
> Setting the attribute to any other value means that the script is a data block, which is not processed. None of the script<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhtml.spec.whatwg.org%2Fmultipage%2Fscripting.html%23the-script-element&data=02%7C01%7Cr.hoekstra%40elsevier.com%7Cb9af1e400f04424acd2d08d7e6055a18%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637230782989666296&sdata=qn4g%2FaQ80oCXl0P%2FiaGzH%2Bzdos2EHdazBrENC5XgJtU%3D&reserved=0> attributes (except type<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhtml.spec.whatwg.org%2Fmultipage%2Fscripting.html%23attr-script-type&data=02%7C01%7Cr.hoekstra%40elsevier.com%7Cb9af1e400f04424acd2d08d7e6055a18%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637230782989666296&sdata=42SuA5mI8rKy9zyZNqSbKbYSqgiSIfRM9C%2Fmer3a0w0%3D&reserved=0> itself) have any effect on data blocks. Authors must use a valid MIME type string<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmimesniff.spec.whatwg.org%2F%23valid-mime-type&data=02%7C01%7Cr.hoekstra%40elsevier.com%7Cb9af1e400f04424acd2d08d7e6055a18%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637230782989676291&sdata=%2BmLT8XSvx1BgYmbTXu7UtWtxIZ3%2B3mPWoVLkjtmFcUY%3D&reserved=0> that is not a JavaScript MIME type essence match<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmimesniff.spec.whatwg.org%2F%23javascript-mime-type-essence-match&data=02%7C01%7Cr.hoekstra%40elsevier.com%7Cb9af1e400f04424acd2d08d7e6055a18%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637230782989676291&sdata=wtOXd2T3TVnc3Pu%2B3P0UmJy5gDLNBVI8J%2FoxakfJOU8%3D&reserved=0> to denote data blocks.
https://html.spec.whatwg.org/multipage/scripting.html#the-script-element:attr-script-type-4<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhtml.spec.whatwg.org%2Fmultipage%2Fscripting.html%23the-script-element%3Aattr-script-type-4&data=02%7C01%7Cr.hoekstra%40elsevier.com%7Cb9af1e400f04424acd2d08d7e6055a18%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637230782989686280&sdata=ISYj6DFGgWCCqhpb7Lweho7jLi4qVWYS0MOFE2Pc2vE%3D&reserved=0>

Sadly, browsers do not currently encourage the use of "remote" data blocks--which is essentially what you've described: `<script type="..." src="..."></script>`

In the meantime, JSON-LD already recommends the use of an HTTP Link header using `rel="alternate"` for discovering JSON-LD variant for the current resource request. So, the same system can be applied to a link element with the same conceptual result: `<link rel="alternate" type="application/ld+json" href="..." />`

However, you may also want to relate other JSON-LD to your page that isn't a 1-to-1 alternate representation for the current resource, so you'd want to express those with other rel values:
https://www.iana.org/assignments/link-relations/link-relations.xhtml<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.iana.org%2Fassignments%2Flink-relations%2Flink-relations.xhtml&data=02%7C01%7Cr.hoekstra%40elsevier.com%7Cb9af1e400f04424acd2d08d7e6055a18%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637230782989686280&sdata=Kt7wEA9Gt5bhqsjmEDlvaq3fTy5Llt7zDQVkgDPPmNs%3D&reserved=0>

Would using the link element approach work for your use case?

There's certainly more to explore in this area. 🙂

Cheers!
Benjamin


--

http://bigbluehat.com/<https://nam03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbigbluehat.com%2F&data=02%7C01%7Cr.hoekstra%40elsevier.com%7Cb9af1e400f04424acd2d08d7e6055a18%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637230782989686280&sdata=to5nhBab%2F94lfSDaOzFaaOU9yNYapxyeeZzXD9rsn%2Bw%3D&reserved=0>

http://linkedin.com/in/benjaminyoung<https://nam03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flinkedin.com%2Fin%2Fbenjaminyoung&data=02%7C01%7Cr.hoekstra%40elsevier.com%7Cb9af1e400f04424acd2d08d7e6055a18%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637230782989696276&sdata=1ZSKUoUX93lVBIJSKUqZUmuHdX5158l0znyPucHoBJU%3D&reserved=0>

________________________________
From: Hoekstra, Rinke (ELS-AMS) <r.hoekstra@elsevier.com>
Sent: Tuesday, April 21, 2020 8:28 AM
To: public-json-ld-wg@w3.org <public-json-ld-wg@w3.org>
Cc: Breebaart, Matthijs (ELS-AMS) <m.breebaart@elsevier.com>; Townsend, Andrew S. (ELS) <a.townsend@elsevier.com>
Subject: HTML Content Algorithms dont' take external JSON-LD data into account

Hi All,

We stumbled upon something odd when going through the HTML Content Algorithms (section 9.5 of the JSON LD 1.1 API document, [1]).

The algorithm extracts the JSON-LD from the textContent of script elements with a JSON-LD mime type as value for the "type" attribute.

We have cases where, similar to e.g. JavaScript, our HTML documents refer to JSON-LD data that is hosted external to the HTML document itself.

Our current approach is to use an empty script element with "type" set to the JSON-LD mime type, and "src" set to the dereferenceable IRI of the JSON-LD dataset that we want to process.

Our assumption was that JSON-LD processing of HTML documents would automatically consume these external datasets, but the current algorithm doesn't allow for this. That is, if we indeed read the specs correctly.

I appreciate that it's a bit late in the game, but it would be good to at least have the algorithm state explicitly that loading such external JSON-LD data using a "src" attribute is OPTIONAL. We'd rather not standardise on this internally when the JSON-LD spec may opt for using e.g. link elements at a later stage.

Thanks,

Rinke


[1] https://www.w3.org/TR/json-ld11-api/#html-content-algorithms<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2FTR%2Fjson-ld11-api%2F%23html-content-algorithms&data=02%7C01%7Cr.hoekstra%40elsevier.com%7Cb9af1e400f04424acd2d08d7e6055a18%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637230782989696276&sdata=6SkYApBt51pZPo6cLljh2vEcnfq31dbWwVp3tO0xbbk%3D&reserved=0>

---
Dr. Rinke Hoekstra
Lead Architect - Knowledge
Elsevier​, Amsterdam
r.hoekstra@elsevier.com

________________________________

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.

________________________________

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
Received on Tuesday, 21 April 2020 15:53:12 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 21 April 2020 15:53:13 UTC