Re: For the discussion on the PWP

Hadrien – I have been looking towards the Selector model of the Web Annotation specification as ways that we might be able to specify text locations in a reliable fashion (and without re-inventing the wheel).  It provides a number of well-defined (and implemented/tested) models for how to refer to either specific semantic text pieces, arbitrary text or ranges of both/either.  This should hopefully remove the need for fragments – at least in the context of a larger environment.

However, there are cases where we may need a fragment identifier but I think that will be more specific to the packaging format, so we won’t understand what it may/will look like till we are further down that path

Leonard

From: Hadrien Gardeur <hadrien.gardeur@feedbooks.com>
Date: Thursday, November 17, 2016 at 4:47 AM
To: "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com>
Cc: Ivan Herman <ivan@w3.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>, Ric Wright <rkwright@geofx.com>, Laurent Le Meur <laurent.lemeur@edrlab.org>
Subject: Re: For the discussion on the PWP
Resent-From: <public-digipub-ig@w3.org>
Resent-Date: Thursday, November 17, 2016 at 4:48 AM

Hello Tzviya,

Thanks for the additional context, this is really useful.

If the main requirement is for a locator that can be referenced (similar to CFI), then IMO there are three parts to consider:

  *   how do you link to a specific resource?
  *   how do you express the fragment identifier for that resource?
  *   how do you express the context (which publication does the resource belongs to for instance)?
If you take the example of CFI, it actually covers these first two points, not the third one. That last point was explored for a while by the EPUB 3 WG, but was abandoned as we couldn't really make real progress on that issue.

In CFI, the left-most part points to the resource by going through the spine of an EPUB. On the Web we have a much easier and more reliable mechanism for that: URIs.
This is where canonical URIs can be useful, PWP could require the usage of canonical URIs when link@rel="canonical" is present in HTTP or HTML headers.

For the right-most part (the fragment identifier), there are a number of media fragments for images, video or audio but nothing that we can truly rely on for text.

Finally, no matter if the publication is packaged or not, we'll always have a canonical URI for the manifest that could be used to reference the publication itself.

By adding an "hrefsrc" element to a link object in the manifest, we could then resolve a locator for both the packaged and live version of the same manifest.

Let me provide a quick example.

First of all, here's a locator:
{
  "locator": "http://example.org/books/1/video/cutecats.mp4#t=120",
  "publication": "http://example.org/books/1/manifest.json"
}

One might wonder why you would explicitly need two URIs: since a resource can be included in any number of publications, you need this additional context to know exactly where your locator belongs to.

User A has made an annotation in a browser, now user B wants to use that annotation in the packaged variant of that Web Publication.

The manifest for the packaged variant has the following element:
{"href": "video/cutecats.mp4", "hrefsrc": "http://example.org/books/1/video/cutecats.mp4", "type": "video/mp4"}

Thanks to "hrefsrc", I can actually understand that the locator applies to "video/cutecats.mp4" in the package, and that the annotation starts at 120s in the video.
I can also double check the URI in "publication" with the link@rel="self" from the manifest, to make sure that this is actually the same publication.

Hope that this is useful,

Cheers,
Hadrien


2016-11-16 14:59 GMT+01:00 Siegman, Tzviya - Hoboken <tsiegman@wiley.com<mailto:tsiegman@wiley.com>>:
Hi Hadrien,

Ivan is travelling today, so I will attempt to answer this. I also require a bit of clarification, please.


1.       We may need to reconsider the use of the term “Canonical Locator”, given its implications in the world of HTML/HTTP. We all but removed the term from our Use Cases & Requirements, and I am going to log an issue to clarify the usage of the term in this document. We may come back to using rel=”canonical” for (P)WP, but we may not. I believe the usage intended here has more to do with identification, as explained in Addressing and Identification [1]. That being said, this document needs a lot of work, and we need to work out many details.



2.       One of the things we would very much like to accomplish is the ability to address components of the package using the same URI, regardless of package state (packaged, unpackaged, whatever terms we create in the future). Essentially, we need to create a fragment identifier. If I’m understanding your  proposal correctly, you are suggesting creating a link attribute “hrefsrc”. Is that correct? Will that work on an <a> element? I think your proposal will do well for the manifest, as it’s discussed in this document, but that is a work-in-progress as well.  Further, we need something that works for those who are not already inside the “package”, such as citations. This can be viewed as a replacement/improvement of CFI.  We looked at Packaging on the Web [2], which has not been pursued in the W3C, but the work that they did is worth reviewing.  Looking at HTTP with all its flexibility is an excellent suggestion.



3.       I would love to hear more from you, Ric, Laurent, Daniel about what requirements a reading system has and how we can make this work.

[1] https://w3c.github.io/dpub-pwp/#identification

[2] https://www.w3.org/TR/web-packaging/


Tzviya Siegman
Information Standards Lead
Wiley
201-748-6884<tel:201-748-6884>
tsiegman@wiley.com<mailto:tsiegman@wiley.com>

From: Hadrien Gardeur [mailto:hadrien.gardeur@feedbooks.com<mailto:hadrien.gardeur@feedbooks.com>]
Sent: Wednesday, November 16, 2016 5:37 AM
To: Ivan Herman
Cc: W3C Digital Publishing IG; Ric Wright; Laurent Le Meur
Subject: Re: For the discussion on the PWP

Hello Ivan,

I'm adding Ric & Laurent since this also concerns reading systems directly.

It's not entirely clear to me what canonical locators are used for, a few things comes to mind:

  *   on the Web, a link@rel="canonical" is used when multiple URIs return the same resource, the URI referenced in that link is then considered to be the canonical location for that resource
  *   it seems that there's also a use case for packaged publications, where you might want to update individual resources by keeping the original URI in the manifest
  *   and finally you seem to describe a use case based around redirections, if I'm not misunderstanding your previous email
When the world "canonical locator" is used, I tend to think strictly about the first use case, for which I'm not sure that we need to do much. Shouldn't we simply let HTTP do its job and eventually provide a Link header with rel="canonical"?

For the second use case, this involves packaged publications and reading systems and becomes potentially complex:

  *   first of all, using "hrefsrc" or a similar key should work fairly well for that purpose
  *   in order to update individual resources in the package, we could then rely on the URI in "hrefsrc", but I don't know when/how this content should be updated and what happens if the original version is deleted/modified without proper HTTP status codes being returned
  *   overall, I think it's much easier to update the package as a whole, by keeping a link to the original manifest in the packaged version
  *   but intercepting "https://example.org/books/1/img/mona_lisa.jpg" and serving "/img/mona_lisa.jpg" from the package instead isn't necessarily easy for a reading system. Since most of them are based on a webview, you would either have to:

     *   generate dynamically a Service Worker for each publication based on the info that you extract from the manifest, and then inject that SW in the publication's resources. This means that you need to serve all your local resources using HTTPS and a webview that supports SW, which might be tricky on some platforms (iOS for instance). I also need to double check if SW work on localhost and on any port.
     *   the other option would be to rewrite all URIs referenced in the manifest and used in the publication's resources, which is something that IMO we'd like to avoid with Readium for instance

  *   same problem the other way around if we'd like to say something like "prioritize the resources available on the Web vs those in the package"
While I can understand the potential benefits if we can figure this out, this might be a very challenging problem to solve for people building reading systems.

Am I missing anything or misunderstanding the use cases for canonical locators?

Thanks,
Hadrien

2016-11-15 18:00 GMT+01:00 Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>>:
Hi Hadrien,

On 15 Nov 2016, at 17:34, Hadrien Gardeur <hadrien.gardeur@feedbooks.com<mailto:hadrien.gardeur@feedbooks.com>> wrote:

Hello Ivan,

Just a quick note: this document uses "pwp_manifest" as the rel value to discover a manifest, but I believe that we should actually use the same rel value ("manifest") as the Web App Manifest, just with a different media type.
We don't really need a dedicated relationship for PWP since the relationship isn't affected by the format of the manifest.

Probably. To be honest, the document did not really go into these details, nor I am sure it should (this may just be an input to a possible WG, and the details will have to be clarified at that point).

But I am fine changing it right now. Can you make a pull request?


For the canonical locator, I'm still not sure that I understand fully what this will be used for (there are potentially a lot of use cases), but could this behave slightly like "hreflang", by providing a hint on a link?

For example:
{"href": "img/mona_lisa.jpg", "hrefsrc": "https://example.org/books/1/img/mona_lisa.jpg", "type": "image/jpeg"}


Yes, except that it is probably two-directional. (But all this is still/again a bit in the air.) Two directional in the sense that if a renderer receives  https://example.org/books/1/img/mona_lisa.jpg then it should get to "img/mona_list.jpg". A functionality that may be covered by a SW in the background, actually; that text was written before we _really_ dived into the SW world. Let alone the fact that SW may not be the _only_ implementation vehicle.

Cheers

Ivan



Hadrien

2016-11-15 17:16 GMT+01:00 Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>>:
I made a first re-shuffling of the PWP draft

http://w3c.github.io/dpub-pwp/


mostly along the lines of

https://github.com/w3c/dpub-pwp/blob/gh-pages/TODO.md


'Mostly', because I made copy-pastes from the previous version and some of the items in the TODO are in a single section now. But, I believe, the content is there.

I will not touch this document until next week and, afaik, a more detailed discussion will happen on the call on Monday. Until then, have a look at it and, of course, feel free to contribute to the text!

Ivan

----
Ivan Herman, W3C
Digital Publishing Technical Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153<tel:%2B31-641044153>
ORCID ID: http://orcid.org/0000-0003-0782-2704






--
Hadrien Gardeur
Co-founder, Feedbooks
http://www.feedbooks.com<http://www.feedbooks.com/>
T: +33.6.63.28.59.69<tel:%2B33.6.63.28.59.69>
E: hadrien.gardeur@feedbooks.com<mailto:hadrien.gardeur@feedbooks.com>
54, rue de Paradis
75010 Paris, France


----
Ivan Herman, W3C
Digital Publishing Technical Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153<tel:%2B31-641044153>
ORCID ID: http://orcid.org/0000-0003-0782-2704






--
Hadrien Gardeur
Co-founder, Feedbooks
http://www.feedbooks.com

T: +33.6.63.28.59.69<tel:%2B33.6.63.28.59.69>
E: hadrien.gardeur@feedbooks.com<mailto:hadrien.gardeur@feedbooks.com>
54, rue de Paradis
75010 Paris, France



--
Hadrien Gardeur
Co-founder, Feedbooks
http://www.feedbooks.com

T: +33.6.63.28.59.69
E: hadrien.gardeur@feedbooks.com<mailto:hadrien.gardeur@feedbooks.com>
54, rue de Paradis
75010 Paris, France

Received on Tuesday, 22 November 2016 21:29:22 UTC