Re: For the discussion on the PWP from Ivan Herman on 2016-11-23 (public-digipub-ig@w3.org from November 2016)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 23 Nov 2016 09:56:32 +0100
To: Hadrien Gardeur <hadrien.gardeur@feedbooks.com>
Cc: W3C Digital Publishing IG <public-digipub-ig@w3.org>, Ric Wright <rkwright@geofx.com>, Laurent Le Meur <laurent.lemeur@edrlab.org>
Message-Id: <C7E794A4-AD6A-4792-83C4-5615A853FDDA@w3.org>
> On 23 Nov 2016, at 03:30, Hadrien Gardeur <hadrien.gardeur@feedbooks.com> wrote:
> 
> You mean links in the resources within a WP? My answer would be no. The resources, when put into a package, should be unchanged (with a possible exception for the manifest, maybe).
> 
> But if they're unchanged, then we hit the issue that I've described before, where it becomes tricky for a reading system to properly display such a packaged publication.
> You either have to dynamically create your own Service Worker, act as a proxy for a a webview (same idea than a Service Worker) or rewrite those links dynamically in the RS.
> 

Exactly. We essentially hit this issue. The non-satisfactory answer was that it should be possible to construct a manifest coming from different sources: combine the original manifest with something that, say, L provide in our example with further information. We went down that line, but that sounded fairly complicated and, with a common agreement, we backpedalled. But that may mean that there are some cases that we simply cannot cover


> Now, to go back to the example that Dave has provided, let's get into a little more details:
> Publisher P publishes Orlando and uses https://www.publisher-P/new/Orlando/ <https://www.publisher-p/new/Orlando/> as the unique identifier for it
> The manifest for Orlando is available at https://www.publisher-P/new/Orlando/manifest.json <https://www.publisher-p/new/Orlando/manifest.json>
> User A gets access directly to the Web Publication Manifest (I'm using the syntax that I've used until now to illustrate this entirely):
> 
> {
>   "metadata": {
>     "identifier": "https://www.publisher-P/new/Orlando/ <https://www.publisher-p/new/Orlando/>",
>     "title": "Orlando"
>   },
> 
>   "links": [
>     {"rel": "self", "href": "https://www.publisher-P/new/Orlando/manifest.json <https://www.publisher-p/new/Orlando/manifest.json>", "type": "application/webpub+json"}
>   ],
> 
>   "spine": [
>     {"href": "c001.html", "type": "text/html"},
>     {"href": "c002.html", "type": "text/html"},
>     {"href": "c003.html", "type": "text/html"},
>     {"href": "c004.html", "type": "text/html"}
>   ]
> }
> 
> To reference "c001.html" in this specific publication, a locator could either use:
> the identifier (https://www.publisher-P/new/Orlando/ <https://www.publisher-p/new/Orlando/>) + https://www.publisher-P/new/Orlando/c001.html <https://www.publisher-p/new/Orlando/c001.html>
> the canonical link to the manifest (https://www.publisher-P/new/Orlando/manifest.json <https://www.publisher-p/new/Orlando/manifest.json>) + https://www.publisher-P/new/Orlando/c001.html <https://www.publisher-p/new/Orlando/c001.html>
> All three references should remain stable, no matter how you access the publication.
> 
> Now User B gets access to a packaged version of that book. There are many different ways this publication could have been packaged, for instance:
> by the publisher, at the same time that the Web Publication itself was published on the Web
> by a third party client, that simply accessed the manifest and its resources to create a new package
> While I've read quite a few times in this group mentions that packaging the publication should not impact the path to the resources, I think that this is completely unrealistic.

I am not sure why you say that. I think it is perfectly fine to say that once I create a WP, the _relative_ paths of the content remain unchanged. Whether I (temporarily) package it and unpackage it later, the file hierarchy should remain unchanged, only the root should change. If I change the file hierarchy then it is _another_ WP as far as I am concerned.

> But frankly, it doesn't really matter as long as the canonical location of each resource is preserved.
> 
> For instance, let's say that User B version of the same packaged publication looks like this:
> 
> {
>   "metadata": {
>     "identifier": "https://www.publisher-P/new/Orlando/ <https://www.publisher-p/new/Orlando/>",
>     "title": "Orlando"
>   },
> 
>   "links": [
>     {"rel": "self", "href": "https://www.publisher-P/new/Orlando/manifest.json <https://www.publisher-p/new/Orlando/manifest.json>", "type": "application/webpub+json"}
>   ],
> 
>   "spine": [
>     {"href": "chapter1.html", "hrefsrc": "https://www.publisher-P/new/Orlando/c001.html <https://www.publisher-p/new/Orlando/c001.html>", "type": "text/html"},
>     {"href": "chapter2.html", "hrefsrc": "https://www.publisher-P/new/Orlando/c002.html <https://www.publisher-p/new/Orlando/c002.html>", "type": "text/html"},
>     {"href": "chapter3.html", "hrefsrc": "https://www.publisher-P/new/Orlando/c003.html <https://www.publisher-p/new/Orlando/c003.html>", "type": "text/html"},
>     {"href": "chapter4.html", "hrefsrc": "https://www.publisher-P/new/Orlando/c004.html <https://www.publisher-p/new/Orlando/c004.html>", "type": "text/html"}
>   ]
> }
> 

If, as I maintain, the file hierarchy should not change for a specific WP, then I do not think this is all necessary. If the manifest is allowed to change, then the only change that might be necessary for some of the use cases is to have, maybe, a change in the metadata, something like:

{
  "metadata": {
      "identifier": "https://www.publisher-P/new/Orlando/",
      "breadcrumb": [ "h <https://www.example.com/my-books/Orlando/c001.html>ttps://www.library-L/stacks/fiction/Orlando/","h <https://www.example.com/my-books/Orlando/c001.html>ttps://www.library-L/user-U-bookshelf/Orlando/"],
       "title" : "Orlando"
   }
  …
}

which allows the reconstruction of all kinds of URL-s.

But I am beginning to think that anything we would do in this may become way too complex in practice, ie, we may have to shy away from these...


> It doesn't really matter if one packaged version renamed "c001.html" to "chapter1.html" or moved it to a different folder in the package, I can still find back what the reference is by using "hrefsrc".
> 
> 
> Hadrien – I have been looking towards the Selector model of the Web Annotation specification as ways that we might be able to specify text locations in a reliable fashion (and without re-inventing the wheel).  It provides a number of well-defined (and implemented/tested) models for how to refer to either specific semantic text pieces, arbitrary text or ranges of both/either.  This should hopefully remove the need for fragments – at least in the context of a larger environment.
> 
> Leonard, I don't really think it does, but the selector model is a good starting point.
> The main issue that I have with it, is that there are many different options for such selectors. Some are quite stable but lack precision, others are quite the opposite.
> We might not need (or want) to define a brand new fragment identifier, but we'll need to take a close look at the selectors that are available and have a clear decision about the ones that we'll actually use.

We have to be very careful about this. Simply because, alas!, the definition of new fragment identifiers is, officially, quite a mess.

Formally, a fragment identifier in a URL is defined _for a specific media type_. Also, in general, a fragment identifier is defined when the media type is specified. If we decided to define a fragment identifier for WP, that would force us to define a media type for WP, which opens up a bunch of questions, like acceptance by browsers (I expect they would push back on this). That is also the weakness of the fragment identifiers defined in the selector spec: in many respect, they are an intellectual exercise, but difficult to put them into practice. The selector model bypasses this restriction by avoiding the usage of fragment identifiers...

Ivan


> 
> Hadrien


----
Ivan Herman, W3C
Digital Publishing Technical Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Wednesday, 23 November 2016 08:56:49 UTC