Re: HTML Content Algorithms dont' take external JSON-LD data into account from Pierre-Antoine Champin on 2020-04-22 (public-json-ld-wg@w3.org from April 2020)

From: Pierre-Antoine Champin <pchampin@liris.cnrs.fr>
Date: Wed, 22 Apr 2020 14:46:23 +0200
To: "Hoekstra, Rinke (ELS-AMS)" <r.hoekstra@elsevier.com>
Cc: "public-json-ld-wg@w3.org" <public-json-ld-wg@w3.org>, "Breebaart, Matthijs (ELS-AMS)" <m.breebaart@elsevier.com>, "Townsend, Andrew S. (ELS)" <a.townsend@elsevier.com>
Message-ID: <CA+OuRR_9qLXaK17Bf-xhGAx7-Le9d-WoCQO8+ZAvbFurtx36Lg@mail.gmail.com>
Dear Rinke,

thanks, I can see more clearly what you are trying to do here. And +1 to
Benjamin's proposed solution.

 best

On Wed, 22 Apr 2020 at 13:34, Hoekstra, Rinke (ELS-AMS) <
r.hoekstra@elsevier.com> wrote:

> Hi Pierre,
>
> Perhaps I haven't explained clearly.
>
> The current JSON-LD spec caters for two use cases (non-exclusive)
>
>    1. Providing a JSON-LD data alternate representation of an HTML
>    document through e.g. content-negotiation. Here there's an implied 1:1
>    mapping between the HTML document and the JSON-LD in that they are both
>    representations of the same thing. This is very much in line with the
>    Linked Data principles.
>    2. Extracting JSON-LD content from one or more script elements
>    contained in an HTML document. Here the HTML document contains information
>    expressed as JSON-LD that may be the alternate representation (at 1), or it
>    may be additional information about the document that cannot be expressed
>    in HTML (but is still needed for proper interpretation, e.g. typical
>    metadata such as authorship, creation date etc.), or it may be additional
>    information of a different nature or about other things (e.g. resources
>    mentioned in the HTML, provenance information associated with the file).
>
> The use case that we're looking for is an extension of 2:
>

> What if the "extra" JSON-LD that we want to associate with the document in
> 2) is maintained independently from the document, or is too large/unwieldy
> to store as part of the HTML document (e.g. provenance). In that case we
> can of course use the JSON-LD data enclosed in the HTML to link out to that
> external information. For instance by a) using link traversal
> (dereferencing outgoing links from the JSON-LD) of b) "borrowing" an import
> mechanism from another standard (such as OWL) or c) using the JSON-LD
> @import keyword.
>
> All three have individual drawbacks: for a) there is no "end" to it (how
> many steps do we need to traverse to obtain the full graph of information
> that was meant to "belong" to the document) and for b) this is meant for
> schema-schema imports and not data-data imports, and for c) this is only
> for importing contexts, not actual data.
>
> All three strategies therefore do not allow us to augment documents with
> information in a distributed fashion while still maintaining some notion of
> document integrity (what belongs to the document and what does not). It
> feels natural to allow the JSON-LD + HTML combination to cater for each of
> these situations: having embedded JSON-LD, and having external JSON-LD that
> still pertains to the document, and both. Really very much in the same way
> that you can embed JavaScript in an HTML document, or link to an external
> JS file.
>
> I acknowledge that perhaps it's a bit "the wrong way 'round" and is rather
> better suited as part of the HTML spec. On the other hand, with the
> attention spent in the JSON-LD spec on processing JSON-LD content as part
> of a HTML document, it feels like an omission that could be easily fixed.
>
> We'll be adopting the link strategy that Benjamin suggested using the
> "describedby" rel type. Though it also doesn't make it clear that this is
> an "import"-like function, it suits our purposes for now.
>
> Best,
>
> Rinke
>
> ---
> Dr. Rinke Hoekstra
> Lead Architect - Knowledge
> Elsevier, Amsterdam
> r.hoekstra@elsevier.com
>
> ------------------------------
> *From:* Pierre-Antoine Champin <pchampin@liris.cnrs.fr>
> *Sent:* 22 April 2020 12:08
> *To:* Hoekstra, Rinke (ELS-AMS) <r.hoekstra@elsevier.com>
> *Cc:* public-json-ld-wg@w3.org <public-json-ld-wg@w3.org>; Breebaart,
> Matthijs (ELS-AMS) <m.breebaart@elsevier.com>; Townsend, Andrew S. (ELS) <
> a.townsend@elsevier.com>
> *Subject:* Re: HTML Content Algorithms dont' take external JSON-LD data
> into account
>
>
> **** External email: use caution ****
>
>
> Dear Rinke,
>
> if the JSON-LD document is available at its own IRI, you can pass this IRI
> directly to your document loader.
> Why would you want to pass the IRI of the HTML document?
> I fail to see your use case...
>
>  best
>
> On Tue, 21 Apr 2020 at 16:35, Hoekstra, Rinke (ELS-AMS) <
> r.hoekstra@elsevier.com> wrote:
>
> Hi All,
>
> We stumbled upon something odd when going through the HTML Content
> Algorithms (section 9.5 of the JSON LD 1.1 API document, [1]).
>
> The algorithm extracts the JSON-LD from the textContent of script elements
> with a JSON-LD mime type as value for the "type" attribute.
>
> We have cases where, similar to e.g. JavaScript, our HTML documents refer
> to JSON-LD data that is hosted external to the HTML document itself.
>
> Our current approach is to use an empty script element with "type" set to
> the JSON-LD mime type, and "src" set to the dereferenceable IRI of the
> JSON-LD dataset that we want to process.
>
> Our assumption was that JSON-LD processing of HTML documents would
> automatically consume these external datasets, but the current algorithm
> doesn't allow for this. That is, if we indeed read the specs correctly.
>
> I appreciate that it's a bit late in the game, but it would be good to at
> least have the algorithm state explicitly that loading such external
> JSON-LD data using a "src" attribute is OPTIONAL. We'd rather not
> standardise on this internally when the JSON-LD spec may opt for using e.g.
> link elements at a later stage.
>
> Thanks,
>
> Rinke
>
>
> [1] https://www.w3.org/TR/json-ld11-api/#html-content-algorithms
> <https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2FTR%2Fjson-ld11-api%2F%23html-content-algorithms&data=02%7C01%7Cr.hoekstra%40elsevier.com%7C7abc1df3d8364e82ea9e08d7e6a52609%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C637231470488618374&sdata=U%2FJEDxZeH52dco27XhAkZwTZ8mbvWcI8%2Fi2zWzqn%2FRM%3D&reserved=0>
>
> ---
> Dr. Rinke Hoekstra
> Lead Architect - Knowledge
> Elsevier, Amsterdam
> r.hoekstra@elsevier.com
>
> ------------------------------
>
> Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The
> Netherlands, Registration No. 33156677, Registered in The Netherlands.
>
>
> ------------------------------
>
> Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The
> Netherlands, Registration No. 33156677, Registered in The Netherlands.
>
Received on Wednesday, 22 April 2020 12:46:50 UTC