Re: accept intralink when Embedding JSON-LD in HTML Documents from Pierre-Antoine Champin on 2021-11-08 (public-json-ld@w3.org from November 2021)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Mon, 8 Nov 2021 09:26:33 +0100
To: Peter Krauss <ppkrauss@gmail.com>, public-json-ld@w3.org
Message-ID: <756461c8-f578-9b1b-cff5-34dda9474859@ercim.eu>
Dear Peter,

my comments inline

On 01/11/2021 01:24, Peter Krauss wrote:
> I am suggesting a small feature, for a future version of JSON-LD, but 
> it's not clear whether it's a matter of  Syntax, API or  in some other 
> document that depends on the component specifications.
>
> *ABSTRACT*: it is intended to make *content-reuse* as easy as 
> Microdata or RDFa+HTML make. The Embedding JSON-LD in HTML Documents 
> <https://www.w3.org/TR/json-ld11/#embedding-json-ld-in-html-documents> 
> is a good replacement for Microdata or RDFa+HTML, but to be complete 
> need to reuse /HTML content/ *instead to duplicate it*. As in the 
> Microdata interpretation 
> <https://html.spec.whatwg.org/multipage/microdata.html#the-basic-syntax>, 
> the string to be translated, from cited (by /id/) HTHM element to 
> plain text must use a method defined by DOM standard 
> <https://en.wikipedia.org/wiki/Document_Object_Model>, the 
> /HTMLElement.*innerText*/ 
> <https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/innerText>.
>
> *EXAMPLE*: supposing an ID in the Example 164 
> <https://www.w3.org/TR/json-ld/#example-164-html-that-describes-a-book-using-microdata> 
> HTML fragment, something as
> ...
>   <dt>Title</dt>
>   <dd><cite  id="tit01"itemprop="http://purl.org/dc/elements/1..1/title 
> <http://purl.org/dc/elements/1.1/title>">Just a Geek</cite></dd>
> ....
> the Example 165 
> <https://www.w3.org/TR/json-ld/#example-165-same-book-description-in-json-ld-avoiding-contexts> 
> JSON-LD fragment will be something as
> [
>    {
>      "@id":"http://purl.oreilly.com/works/45U8QJGZSQKDH8N",
>      "@type":"http://purl.org/vocab/frbr/core#Work 
> <http://purl..org/vocab/frbr/core#Work>",
>      "http://purl.org/dc/elements/1.1/title": {"@id":"#tit01"},
>      "http://purl.org/dc/elements/1.1/creator":"Wil Wheaton",
>      "http://purl.org/vocab/frbr/core#realization": [...]
>    },...
> ]
> So, by the *innerText* rule, the /title /line will be equivalent to
>     "http://purl.org/dc/elements/1.1/title 
> <http://purl..org/dc/elements/1.1/title>":"Just a Geek"

Both snippets are valid under the current specification, but they are 
NOT equivalent. More preciseily, in their RDF interpretation, the object 
of dc:title is an IRI in the first one, a literal in the second one.

Changing the spec to make them equivalent would be a breaking change, 
and would be detrimental to many use-cases. I don't see this happening.

> Note: is a simple get-and-convert algorithm. In a Javascript parser 
> the string can be obtained by 
> |document.getElementById("tit01").innerText|.
> *
> *
> *RATIONALE*: reuse content, avoiding waste, errors or malicious 
> changes... And it is not only reuse, many legal applications (for 
> Internet or digital preservation scope), in Science and Justice, need 
> document transparency and*integrity guarantee*. Examples: JATS 
> <https://en.wikipedia.org/wiki/Journal_Article_Tag_Suite>, 
> SchemaOrg/Legislation <https://schema.org/Legislation>, 
> SchemaOrg/ScholarlyArticle <https://schema.org/ScholarlyArticle>, and 
> other official documents.

I sympathize for this rationale, but I believe it could be addressed 
with some additional mechanisms /on top/ of the existing spec, rather 
than by changing the spec.

You could define a dedicated property ex:innerTextOf. The semantics of 
this property would be as follow: its subject (typically a blank node) 
must be interpreted as semantically equivalent (as with owl:sameAs) to 
the literal retrieved from applying innerText to the IRI object (assumed 
to be an IRI with a fragment identifier, resolving to some HTML fragment).

With the appropriate context, your snipper above would become:

[
   {
     "@id":"http://purl.oreilly.com/works/45U8QJGZSQKDH8N",
     "@type":"http://purl.org/vocab/frbr/core#Work 
<http://purl..org/vocab/frbr/core#Work>",
     "http://purl.org/dc/elements/1.1/title": {"innerTextOf":"#tit01"},
     "http://purl.org/dc/elements/1.1/creator":"Wil Wheaton",
     "http://purl.org/vocab/frbr/core#realization": [...]
   },...
]

It would still requite some "innerTextOf-aware" post-processing to 
produce the expected result... (i.e. replace the blank node with the 
corresponding literal).

Would that be satisfactory?

   pa
Attachments

application/pgp-keys attachment: OpenPGP public key
Received on Monday, 8 November 2021 08:26:38 UTC