Re: A question on RWPM: why the 'metadata' tag? from Leonard Rosenthol on 2018-01-09 (public-publ-wg@w3.org from January 2018)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Tue, 9 Jan 2018 15:16:14 +0000
To: Ivan Herman <ivan@w3.org>
CC: Hadrien Gardeur <hadrien.gardeur@feedbooks.com>, "W3C Publishing Working Group" <public-publ-wg@w3.org>
Message-ID: <D32F1DCF-A740-4C15-9F6F-26113B8528D0@adobe.com>
>    JSON-LD is a funny beast: data in JSON-LD can be looked as pure JSON to be used by various applications, so we want to make it comfortable for that purpose. 
>
True, though you also want to make sure that you give JSON-LD clients the benefits and not "bastardize" it (or else, just do JSON to begin with).


>However, we should also be careful about the "quality" of the RDF it encodes. These two requirements are sometimes contradictory, and may influence design decision to find the right balance. 
>What I did was looking at this along those lines.
>
Agreed, but not just quality but "quantity" - meaning that you need to look at all possible ways in which RDF can be used and evaluate how that could/should be encoded into JSON-LD.  And as you know, there are a number of "sharp edges" in RDF, many of which aren't even odd use cases.   For example, what to do you do with xml:lang?   And not only when present on a simple element but when present on array/lists or bags?


Leonard


On 1/9/18, 6:33 AM, "Ivan Herman" <ivan@w3.org> wrote:

    
    
    > On 9 Jan 2018, at 15:04, Leonard Rosenthol <lrosenth@adobe.com> wrote:
    > 
    > Sorry for coming in late and without background here - been swamped with completely unrelated work lately (.
    > 
    > It appears (for reasons that aren't clear to me yet) that you are looking to serialize RDF-based data into JSON/JSON-LD, is that correct?
    
    Well… almost but not exactly.
    
    JSON-LD is a funny beast: data in JSON-LD can be looked as pure JSON to be used by various applications, so we want to make it comfortable for that purpose. However, we should also be careful about the "quality" of the RDF it encodes. These two requirements are sometimes contradictory, and may influence design decision to find the right balance. What I did was looking at this along those lines.
    
    > If so, then I will point you to a new Work Item in ISO TC 130 WG2TF4, the committee where XMP (the industry standard RDF-based metadata model for assets) is standardized.  This new work item, which is being done jointly with other groups such as the IPTC, is to standardize a JSON-LD serialization of XMP.  Which sounds like *exactly* what you are looking for as well.   Yes??
    
    It is obviously close. Thanks for the pointers!
    
    Ivan
    
    > 
    > Leonard
    > 
    > On 1/9/18, 4:18 AM, "Ivan Herman" <ivan@w3.org> wrote:
    > 
    >    Sorry, stupid subject line mistake, s/manifest/metadata/ :-)
    > 
    >    I.
    > 
    >> On 9 Jan 2018, at 13:13, Ivan Herman <ivan@w3.org> wrote:
    >> 
    >> Hadrien,
    >> 
    >> (I did not want to put this as an issue at this moment; it may become relevant if we go down the RWPM way but, until then, this should not yet be an 'official' issue. So let us keep to the ML for now.)
    >> 
    >> (My apologies to readers who do not have an RDF background; this is a fairly technical mail…)
    >> 
    >> I looked at the opening example in[1]. What I was curious was to see how all this looks like in RDF. I used a converter that generated the following triples (I use Turtle which is much more RDF-like…):
    >> 
    >> <<<<
    >> 
    >> @prefix ns1: <owl:> .
    >> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    >> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    >> @prefix schema: <http://schema.org/> .
    >> @prefix xml: <http://www.w3.org/XML/1998/namespace> .
    >> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    >> 
    >> <urn:isbn:978031600000X> a schema:Book ;
    >>   schema:author "Herman Melville" ;
    >>   schema:dateModified "2015-09-29T17:00:00+00:00"^^xsd:dateTime ;
    >>   schema:inLanguage "en" ;
    >>   schema:name "Moby-Dick" .
    >> 
    >> [] schema:hasPart [ schema:fileFormat "text/html" ;
    >>           schema:name "Chapter 2" ;
    >>           schema:url "c002.html" ],
    >>       [ schema:fileFormat "text/html" ;
    >>           schema:name "Chapter 1" ;
    >>           schema:url "c001.html" ] ;
    >>   ns1:sameAs <urn:isbn:978031600000X> .
    >> 
    >> <<<<
    >> 
    >> Which, mostly, looks fine, except for that trick of using owl:sameAs to identify the canonical object (the book) with a blank node. I see several issues with that:
    >> 
    >> - Die hard RDF/Linked Data people really try to avoid the usage of blank nodes, because they are a source of constant problems in various RDF related routines, algorithms, etc. There are cases when they are almost necessary (the objects in the schema:hasPart construction above look perfectly fine to me), but the outer blank node (ie, the '[]') would really put many people off.
    >> 
    >> - I have not looked at the RDF tool landscape lately, but, afaik, OWL is often ignored by RDF related tools. owl:sameAs _may_ be an exception here and there (there are triple stores that do an owl:sameAs reasoning against their data) but this is not universal. (E.g., the Python RDFLib tool does not do that automatically, you have to use external libraries.) This also means that, e.g., SPARQL requests may fail on querying (in the example above) the "c002.html" part if they only use the ISBN identifier although, semantically, this should be fine.
    >> 
    >> - (I am not sure the schema.org tools are prepared for this, although that may not be a strong argument)
    >> 
    >> So I was trying to get rid of this. The usage of owl:sameAs his is the artefact of mapping the "metadata" term against "owl:sameAs" in the context file. This is necessary because, in the RWPM you do have this separate "metadata" term:
    >> 
    >> 
    >> <<<<
    >> 
    >> "metadata" : {
    >>  "@type": "http://schema.org/Book",
    >>   "title": "Moby-Dick",
    >>   "author": "Herman Melville",
    >>   "identifier": "urn:isbn:978031600000X",
    >>   "language": "en",
    >>   "modified": "2015-09-29T17:00:00Z"
    >> }
    >> 
    >> <<<<
    >> 
    >> and if "metadata" is not mapped in context, it will be ignored (together with the JSON object).
    >> 
    >> Hence the question: what does this "metadata" term bring to the table in the first place? Why can't one have the example in [1] be simply
    >> 
    >> <<<<
    >> {
    >>   "@context": "http://��",
    >>   "identifier": "urn:isbn:978031600000X",
    >>   "author": "Herman Melville",
    >>   …
    >>   "spine": [
    >>      …
    >>   ]
    >>   …
    >> }
    >> <<<<
    >> 
    >> It strikes me as much more straightforward for authors/users as well. I have used this and got the much more straightforward Turtle output:
    >> 
    >> <<<<
    >> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    >> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    >> @prefix schema: <http://schema.org/> .
    >> @prefix xml: <http://www.w3.org/XML/1998/namespace> .
    >> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    >> 
    >> <urn:isbn:978031600000X> a schema:Book ;
    >>   schema:author "Herman Melville" ;
    >>   schema:dateModified "2015-09-29T17:00:00+00:00"^^xsd:dateTime ;
    >>   schema:hasPart [ schema:fileFormat "text/html" ;
    >>           schema:name "Chapter 2" ;
    >>           schema:url "c002.html" ],
    >>       [ schema:fileFormat "text/html" ;
    >>           schema:name "Chapter 1" ;
    >>           schema:url "c001.html" ] ;
    >>   schema:inLanguage "en" ;
    >>   schema:name "Moby-Dick" .
    >> <<<<
    >> 
    >> I can imagine that there *are* some terms that you do not want to appear in RDF. And that is fine: you already use the trick (e.g., for resources) whereby a term that has no mapping in a JSON-LD context (or is not a URI by itself) is ignored by a JSON-LD processor, ie, you can hide anything you want.
    >> 
    >> WDYT?
    >> 
    >> Ivan
    >> 
    >> P.S. Note, b.t.w., that the JSON-LD 1.1 document[2], which is currently a CG draft, introduces the notion of 'nested properties'[3] which does something similar: it essentially says "ignore this term and the resulting nesting, it is semantically meaningless". Ie, if "metadata" would be defined as "@nest" in the context, we would get the same simplified Turtle. At this moment JSON-LD 1.1 is a CG draft, although there are plans to submit that work as a possible WG at W3C to issue it as a new version of JSON-LD, but that is not yet in motion.
    >> 
    >> 
    >> 
    >> [1] https://github.com/readium/webpub-manifest

    >> [2] https://json-ld.org/spec/latest/json-ld

    >> [3] https://json-ld.org/spec/latest/json-ld/#nested-properties

    >> 
    >> 
    >> ----
    >> Ivan Herman, W3C
    >> Publishing@W3C Technical Lead
    >> Home: http://www.w3.org/People/Ivan/

    >> mobile: +31-641044153
    >> ORCID ID: http://orcid.org/0000-0003-0782-2704

    >> 
    > 
    > 
    >    ----
    >    Ivan Herman, W3C
    >    Publishing@W3C Technical Lead
    >    Home: http://www.w3.org/People/Ivan/

    >    mobile: +31-641044153
    >    ORCID ID: http://orcid.org/0000-0003-0782-2704

    > 
    > 
    > 
    
    
    ----
    Ivan Herman, W3C
    Publishing@W3C Technical Lead
    Home: http://www.w3.org/People/Ivan/

    mobile: +31-641044153
    ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Tuesday, 9 January 2018 15:16:43 UTC