Re: Musings on PWP Offline/Online Modes from Leonard Rosenthol on 2016-01-05 (public-digipub-ig@w3.org from January 2016)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Tue, 5 Jan 2016 12:58:08 +0000
To: Ivan Herman <ivan@w3.org>, Tzviya Siegman <tsiegman@wiley.com>
CC: Charles LaPierre <charlesl@benetech.org>, Nick Ruffilo <nickruffilo@gmail.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <173B672C-ED7D-48CC-AC3F-3742130C8A83@adobe.com>
I agree completely with Ivan here - the problem is not the URL but what URL does the RS use at any given time (online, offline, cached, etc.)

In my opinion, that is entirely up to either the author (human or machine) OR the RS.

In our example of the Mona Lisa, I would expect that the leonardo.html file simply refers to the http://www.louvre.org/monlisa URL, for the reasons that Ivan mentions in #1 below.  And any RS processing that content would be expected to resolve that URL and use that content there.  However, there is nothing that would prevent a given RS from caching that material – but that is NOT a requirement for an RS nor is the specific implementation of the caching mandated in any way. (and this is also consistent with Ivan’s position)

When this Mona Lisa content is taken offline, the process that is doing so, should be able to take whatever resources it wishes and include them in the offline representation.  In doing so, any process that requires modification of original material would be a serious flaw in the use cases for PWP, so changing the URLs would not work.   (NOTE: this also means that if capturing arbitrary web content is an important use case, we also need to consider other issues like scripting, etc.)

Since we can’t/won’t change the content, will need to provide a mapping facility for the RS to use  (as described in Ivan’s #3).  Where this lives in the PWP or what it looks like seems to be tied to the specific format of a PWP – which is something out of scope for the PWP definition.   We would just need to declare that such a concept is necessary and then let other groups (like the EPUB folks) define what it would look like in their format.

Leonard

From:  Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>>
Date: Tuesday, January 5, 2016 at 7:31 AM
To: Tzviya Siegman <tsiegman@wiley.com<mailto:tsiegman@wiley.com>>
Cc: Charles LaPierre <charlesl@benetech.org<mailto:charlesl@benetech.org>>, Nick Ruffilo <nickruffilo@gmail.com<mailto:nickruffilo@gmail.com>>, W3C Digital Publishing IG <public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>>
Subject: Re: Musings on PWP Offline/Online Modes
Resent-From: <public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>>
Resent-Date: Tue, 5 Jan 2016 12:31:51 +0000

Hey everyone…

On 4 Jan 2016, at 22:07, Siegman, Tzviya - Hoboken <tsiegman@wiley.com<mailto:tsiegman@wiley.com>> wrote:


The various use cases that Nick, Charles, Bill, or Heather refer to on the thread are the perfect examples of the kind of use cases that triggered this discussion on PWP. However...

The paragraph that Charles highlighted is exactly the point that we spent so much time discussing today.

Let’s say that the Louvre’s digital rendition of the Mona Lisa is the one that is used in all books. When I include the Mona Lisa in my PWP, be it An Introduction of Renaissance Art,  Doctoral Thesis on Leonardo’s expression of Smiles in Oils, or the Da Vinci Code, what is the locator in <img src=””>?

Should it be www.louvre.org/monalisa<http://www.louvre.org/monalisa> (the original locator for Mona Lisa) or should it be https://pwp.server.com/publication1/f01 (or whatever pwp locators looks like) (the local locator for Mona Lisa in this publication)? If It is the latter, and we are talking not just about images (which have a defined mechanism for locators on the web), how do we define this thing we’ve been calling a package?

…this is the fundamental question indeed which have been fighting with.

(In the musing below I consider it as a requirement that the image of the Mona Lisa should be available for offline reading, ie, conceptually, it is not an "external" resource. Also, I think we should keep to the offline vs. online version of the PWP and leave the term package out of it. Whether the offline version is packaged in some way or other is a detail at this point I believe.)

The problem is not with the URL per se. The problem is: how would  the book chapter on Leonardo (say, leonardo.html) refer to the Mona Lisa? The natural way for the PWP instance on the Web is to use the URL managed by the Louvre. The problem or, shall we say, the missing bit in Nick's schema and which is the issue to be answered for PWP is: how does the system know that, when it is turned off-line, that the PWP processor should should use the pwp.server.com<http://pwp.server.com> URL, be used instead of the Louvre's URL?

I can see several approaches

1. We do not want/have to solve this issue in PWP. Using the http://www.louvre.org/monalisa URL works on the Web and, when going offline version, the system should "cache" the resource, as well as its URL, and make the mapping behind the scenes if needed. It is not up to the PWP author to deal with this. That also means that the leonardo.html file does not change, it can safely issue a reference to http://www.louvre.org/monalisa.


Well, this is exactly what Service Workers do! They cache a resource and catch any request to its URL on the fly to serve the cached resource instead of going out to the network. That is why it works off line.

There are two issues with this, though. On the one hand, it is a bit uncomfortable to define a PWP depending on a particular technology in vogue, although we may be able to define the offline behaviour of PWP in general accordingly. Also, we have to take into account Bill's warning: we may not want to cache all resources; some of them should be considered as external and if the reader is offline then, well, one gets a 404; one has to live with this, and that is fine. Ie, one may want to list the to-be-cached resources in the manifest, and the implementation may have to decide whether a resource should or should not be cached.

2. The other extreme solution is to say: when the offline version is created somehow (eg, via packaging) not only does one have to create a local version of the Mona Lisa image, but all references thereof should be updated in the files, eg, in the leonardo.html file. I think this would terribly error prone (we are not only talking about the value of the href attributes, but, say, URL-s generated on-the-fly by a script…). I think we should keep away from that approach.

3. The third approach is somewhere in between. Resources like leonardo.html should not be changed. Instead, the offline version of the PWP would include an information on the

http://www.louvre.org/monalisa -> https://pwp.server.com/publication1/f01


mapping, and the PWP processor should work, roughly, through "try the first URI, if it is unreachable, try the second one" (or something like that). Note that, conceptually, this is "simply" surfacing what the caching behavior would have to implement. Obviously, one place to put this information is the manifest.

But who would have to create this information? The author? Well, I do not think that is realistic: can we expect, to use Nick's example, the Wikipedia people to add this type of information to all Wiki pages? I do not think this will happen.

Does anybody sees another alternative?

Ivan




This is really just a manifest – a list of locators. Online, that is a method of organizing websites, not too different from what we see on the web today, just a little more structured. To get to the offline state, something has to happen to put all the things at the other end of the URLs in a package. So, we arrive again at the importance of the manifest.


(Maybe this is what all those people who were talking about books as APIs should have meant 3 years ago when I was rolling my eyes because I was tired of the term.)

Tzviya Siegman
Digital Book Standards & Capabilities Lead
Wiley
201-748-6884
tsiegman@wiley.com<mailto:tsiegman@wiley.com>

From: Charles LaPierre [mailto:charlesl@benetech.org]
Sent: Monday, January 04, 2016 3:45 PM
To: Nick Ruffilo
Cc: DPUB mailing list (public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>)
Subject: Re: Musings on PWP Offline/Online Modes

I like this idea Nick, especially the part about

This could have many benefits.  Imagine that there are a bunch of scholarly publications that all reference a single image/diagram.  The web-based PWP version can reference a single online canonical URL, whereas the offline PWP can have it's own local instance (meaning less duplication, and the ability to update all the online PWPs at once if there is an update to that image.  This is OPTIONAL, so if someone wanted to do a snapshot, they just reference a local image.


Now lets say there is are extended descriptions for this image, a 3D model of this image, and/or a Tactile representation of this image with a Tour description explaining what the tactile image is.  Now this is done only once and all PWP’s would point to this image with its attached extended descriptions.  The packager which would create the offline version could also grab these extended descriptions as well.  Custom Elements could be used here to interact with these alternative representations of the image.

Thanks.

Charles LaPierre
Sr. Software Engineer
charlesl@benetech.org<mailto:charlesl@benetech.org>


On Jan 4, 2016, at 9:28 AM, Nick Ruffilo <nickruffilo@gmail.com<mailto:nickruffilo@gmail.com>> wrote:



The conversation today got me thinking - and maybe it's the new year crazies, but I got to thinking of the true value of having something of a PWP "engine" that would provide unique value.  Below are some use cases and what I feel is an interesting way to handle those cases:

The "vanilla" fully-offline package
This is probably closest to what epub is today.  All the files for the PWP are located in the same base, and besides the occasional <a href=""> link that points to an external resource, all items are contained within a package.  With little effort, the package can exist on a server and as long as there is a reading system that can handle the manifest, the content can be read in a linear or whatever method we end up with.

I think we're all in agreement here - ignoring word choice like manifest, etc.


The web-page-in-a-box
Fonts live on other servers, images live on other servers, CSS Frameworks live on a CDN, It's a beautiful (messy) web.  How does this become offline?  This would require heavy lifting on the part of the browser or the server (whatever generates the document) but imagine if the packager could take these resources offline.

Example: I'm reading a wikipedia article, and I want to download it as a PWP.  Wikipedia could specify a list of resources (heck, even a hyper-minified version of their CSS) as well as all the images related to that Wikipedia article.  All of those get packaged into a PWP that I can download and read whenever.  YES IT WILL BE A SNAPSHOT of the page at that time, but that isn't necessarily a bad thing...  It could even have update instructions (or an update URL).

External resources get added to the root path in some way like: /http/somedomaincom/path/to/external/file.css

This could have many benefits.  Imagine that there are a bunch of scholarly publications that all reference a single image/diagram.  The web-based PWP version can reference a single online canonical URL, whereas the offline PWP can have it's own local instance (meaning less duplication, and the ability to update all the online PWPs at once if there is an update to that image.  This is OPTIONAL, so if someone wanted to do a snapshot, they just reference a local image.

For publishers - they could have a common CSS framework that they could keep up-to-date, so that if they found a bug, or decided that they wanted body color to be bright orange, they could update it once, and all new offline PWPs that are generated get that.

Since this is 100% optional, those who wanted full control can simply opt to create their content fully within a single root.  The ability to be able to specify certain online resources to be "critical" to an offline package could create production benefits (and yes, I realize it could also create some headaches).



--
- Nick Ruffilo
@NickRuffilo
http://Aerbook.com<http://aerbook.com/>
http://twitch.tv/TheWizardLlewyn

http://ZenOfTechnology.com<http://zenoftechnology.com/>


----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Tuesday, 5 January 2016 12:58:45 UTC