A question on RWPM: why the 'manifest' tag? from Ivan Herman on 2018-01-09 (public-publ-wg@w3.org from January 2018)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 9 Jan 2018 13:13:38 +0100
To: Hadrien Gardeur <hadrien.gardeur@feedbooks.com>
Cc: W3C Publishing Working Group <public-publ-wg@w3.org>
Message-Id: <F0CA88B5-47DD-4547-AF3E-E4CD16F95D83@w3.org>
Hadrien,

(I did not want to put this as an issue at this moment; it may become relevant if we go down the RWPM way but, until then, this should not yet be an 'official' issue. So let us keep to the ML for now.)

(My apologies to readers who do not have an RDF background; this is a fairly technical mail…)

I looked at the opening example in[1]. What I was curious was to see how all this looks like in RDF. I used a converter that generated the following triples (I use Turtle which is much more RDF-like…):

<<<<

@prefix ns1: <owl:> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<urn:isbn:978031600000X> a schema:Book ;
    schema:author "Herman Melville" ;
    schema:dateModified "2015-09-29T17:00:00+00:00"^^xsd:dateTime ;
    schema:inLanguage "en" ;
    schema:name "Moby-Dick" .

[] schema:hasPart [ schema:fileFormat "text/html" ;
            schema:name "Chapter 2" ;
            schema:url "c002.html" ],
        [ schema:fileFormat "text/html" ;
            schema:name "Chapter 1" ;
            schema:url "c001.html" ] ;
    ns1:sameAs <urn:isbn:978031600000X> .

<<<<

Which, mostly, looks fine, except for that trick of using owl:sameAs to identify the canonical object (the book) with a blank node. I see several issues with that:

- Die hard RDF/Linked Data people really try to avoid the usage of blank nodes, because they are a source of constant problems in various RDF related routines, algorithms, etc. There are cases when they are almost necessary (the objects in the schema:hasPart construction above look perfectly fine to me), but the outer blank node (ie, the '[]') would really put many people off.

- I have not looked at the RDF tool landscape lately, but, afaik, OWL is often ignored by RDF related tools. owl:sameAs _may_ be an exception here and there (there are triple stores that do an owl:sameAs reasoning against their data) but this is not universal. (E.g., the Python RDFLib tool does not do that automatically, you have to use external libraries.) This also means that, e.g., SPARQL requests may fail on querying (in the example above) the "c002.html" part if they only use the ISBN identifier although, semantically, this should be fine.

- (I am not sure the schema.org tools are prepared for this, although that may not be a strong argument)

So I was trying to get rid of this. The usage of owl:sameAs his is the artefact of mapping the "metadata" term against "owl:sameAs" in the context file. This is necessary because, in the RWPM you do have this separate "metadata" term:


<<<<

"metadata" : {
   "@type": "http://schema.org/Book",
    "title": "Moby-Dick",
    "author": "Herman Melville",
    "identifier": "urn:isbn:978031600000X",
    "language": "en",
    "modified": "2015-09-29T17:00:00Z"
}

<<<<

and if "metadata" is not mapped in context, it will be ignored (together with the JSON object).

Hence the question: what does this "metadata" term bring to the table in the first place? Why can't one have the example in [1] be simply

<<<<
{
    "@context": "http://��",
    "identifier": "urn:isbn:978031600000X",
    "author": "Herman Melville",
    …
    "spine": [
       …
    ]
    …
}
<<<<

It strikes me as much more straightforward for authors/users as well. I have used this and got the much more straightforward Turtle output:

<<<<
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<urn:isbn:978031600000X> a schema:Book ;
    schema:author "Herman Melville" ;
    schema:dateModified "2015-09-29T17:00:00+00:00"^^xsd:dateTime ;
    schema:hasPart [ schema:fileFormat "text/html" ;
            schema:name "Chapter 2" ;
            schema:url "c002.html" ],
        [ schema:fileFormat "text/html" ;
            schema:name "Chapter 1" ;
            schema:url "c001.html" ] ;
    schema:inLanguage "en" ;
    schema:name "Moby-Dick" .
<<<<

I can imagine that there *are* some terms that you do not want to appear in RDF. And that is fine: you already use the trick (e.g., for resources) whereby a term that has no mapping in a JSON-LD context (or is not a URI by itself) is ignored by a JSON-LD processor, ie, you can hide anything you want.

WDYT?

Ivan

P.S. Note, b.t.w., that the JSON-LD 1.1 document[2], which is currently a CG draft, introduces the notion of 'nested properties'[3] which does something similar: it essentially says "ignore this term and the resulting nesting, it is semantically meaningless". Ie, if "metadata" would be defined as "@nest" in the context, we would get the same simplified Turtle. At this moment JSON-LD 1.1 is a CG draft, although there are plans to submit that work as a possible WG at W3C to issue it as a new version of JSON-LD, but that is not yet in motion.



[1] https://github.com/readium/webpub-manifest
[2] https://json-ld.org/spec/latest/json-ld
[3] https://json-ld.org/spec/latest/json-ld/#nested-properties


----
Ivan Herman, W3C
Publishing@W3C Technical Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Tuesday, 9 January 2018 12:13:47 UTC