Re: Musings on PWP Offline/Online Modes from Nick Ruffilo on 2016-01-05 (public-digipub-ig@w3.org from January 2016)

From: Nick Ruffilo <nickruffilo@gmail.com>
Date: Tue, 5 Jan 2016 09:26:27 -0500
To: Leonard Rosenthol <lrosenth@adobe.com>
Cc: Ivan Herman <ivan@w3.org>, Tzviya Siegman <tsiegman@wiley.com>, Charles LaPierre <charlesl@benetech.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <CA+Dds59TXv++0EjVj+CyJ70JwryDMNR83hPGCjrWZvspvDAe+g@mail.gmail.com>
The notes about simply caching and having "http://louvre.com/monalisa"
point to the cached version seem a much more simply way of achieving what I
was looking to originally do - wish I thought about it at first.

The one thing I have noticed from this conversation is that the only way we
will really answer this question is to:

1) List use cases
2) Discuss the possible solutions to individual situations
3) Slowly cross off the unreasonable solutions

Service workers seem like they can most certainly be part of the solution -
but I'm unsure if service workers - out of the box - are the entire
solution, so the question really becomes, what do we need to do to augment
service workers or whatever the answer is to achieve our goals.  From what
I see, it seems like we need a mechanism to define what resources get
cached and what do not.  Does service workers do this?

-Nick

On Tue, Jan 5, 2016 at 7:58 AM, Leonard Rosenthol <lrosenth@adobe.com>
wrote:

> I agree completely with Ivan here - the problem is not the URL but what
> URL does the RS use at any given time (online, offline, cached, etc.)
>
> In my opinion, that is entirely up to either the author (human or machine)
> OR the RS.
>
> In our example of the Mona Lisa, I would expect that the leonardo.html
> file simply refers to the http://www.louvre.org/monlisa URL, for the
> reasons that Ivan mentions in #1 below.  And any RS processing that content
> would be expected to resolve that URL and use that content there.  However,
> there is nothing that would prevent a given RS from caching that material –
> but that is NOT a requirement for an RS nor is the specific implementation
> of the caching mandated in any way. (and this is also consistent with
> Ivan’s position)
>
> When this Mona Lisa content is taken offline, the process that is doing
> so, should be able to take whatever resources it wishes and include them in
> the offline representation.  In doing so, any process that requires
> modification of original material would be a serious flaw in the use cases
> for PWP, so changing the URLs would not work.   (NOTE: this also means that
> if capturing arbitrary web content is an important use case, we also need
> to consider other issues like scripting, etc.)
>
> Since we can’t/won’t change the content, will need to provide a mapping
> facility for the RS to use  (as described in Ivan’s #3).  Where this lives
> in the PWP or what it looks like seems to be tied to the specific format of
> a PWP – which is something out of scope for the PWP definition.   We would
> just need to declare that such a concept is necessary and then let other
> groups (like the EPUB folks) define what it would look like in their format.
>
> Leonard
>
> *From:*  Ivan Herman <ivan@w3.org>
> Date: Tuesday, January 5, 2016 at 7:31 AM
> To: Tzviya Siegman <tsiegman@wiley.com>
> Cc: Charles LaPierre <charlesl@benetech.org>, Nick Ruffilo <
> nickruffilo@gmail.com>, W3C Digital Publishing IG <
> public-digipub-ig@w3.org>
> Subject: Re: Musings on PWP Offline/Online Modes
> Resent-From: <public-digipub-ig@w3.org>
> Resent-Date: Tue, 5 Jan 2016 12:31:51 +0000
>
> Hey everyone…
>
> On 4 Jan 2016, at 22:07, Siegman, Tzviya - Hoboken <tsiegman@wiley.com>
> wrote:
>
>
> The various use cases that Nick, Charles, Bill, or Heather refer to on the
> thread are the perfect examples of the kind of use cases that triggered
> this discussion on PWP. However...
>
> The paragraph that Charles highlighted is exactly the point that we spent
> so much time discussing today.
>
> Let’s say that the Louvre’s digital rendition of the Mona Lisa is the one
> that is used in all books. When I include the Mona Lisa in my PWP, be it An
> Introduction of Renaissance Art,  Doctoral Thesis on Leonardo’s expression
> of Smiles in Oils, or the Da Vinci Code, what is the locator in <img
> src=””>?
>
> Should it be www.louvre.org/monalisa (the original locator for Mona Lisa)
> or should it be https://pwp.server.com/publication1/f01 (or whatever pwp
> locators looks like) (the local locator for Mona Lisa in this publication)?
> If It is the latter, and we are talking not just about images (which have a
> defined mechanism for locators on the web), how do we define this thing
> we’ve been calling a package?
>
>
> …this *is* the fundamental question indeed which have been fighting with.
>
> (In the musing below I consider it as a requirement that the image of the
> Mona Lisa *should* be available for offline reading, ie, conceptually, it
> is not an "external" resource. Also, I think we should keep to the offline
> vs. online version of the PWP and leave the term package out of it. Whether
> the offline version is packaged in some way or other is a detail at this
> point I believe.)
>
> The problem is not with the URL per se. The problem is: how would  the
> book chapter on Leonardo (say, leonardo.html) refer to the Mona Lisa? The
> natural way for the PWP instance on the Web is to use the URL managed by
> the Louvre. The problem or, shall we say, the missing bit in Nick's schema
> and which is *the* issue to be answered for PWP is: how does the system
> know that, when it is turned off-line, that the PWP processor should should
> use the pwp.server.com URL, be used instead of the Louvre's URL?
>
> I can see several approaches
>
> 1. We do not want/have to solve this issue in PWP. Using the
> http://www.louvre.org/monalisa URL works on the Web and, when going
> offline version, the system should "cache" the resource, as well as its
> URL, and make the mapping behind the scenes if needed. It is not up to the
> PWP author to deal with this. That also means that the leonardo.html file
> does not change, it can safely issue a reference to
> http://www.louvre.org/monalisa.
>
> Well, this is exactly what Service Workers do! They cache a resource and
> catch any request to its URL on the fly to serve the cached resource
> instead of going out to the network. That is why it works off line.
>
> There are two issues with this, though. On the one hand, it is a bit
> uncomfortable to define a PWP depending on a particular technology in
> vogue, although we may be able to define the offline behaviour of PWP in
> general accordingly. Also, we have to take into account Bill's warning: we
> may not want to cache *all* resources; some of them should be considered
> as external and if the reader is offline then, well, one gets a 404; one
> has to live with this, and that is fine. Ie, one may want to list the
> to-be-cached resources in the manifest, and the implementation may have to
> decide whether a resource should or should not be cached.
>
> 2. The other extreme solution is to say: when the offline version is
> created somehow (eg, via packaging) not only does one have to create a
> local version of the Mona Lisa image, but *all references thereof should
> be updated in the files*, eg, in the leonardo.html file. I think this
> would terribly error prone (we are not only talking about the value of the
> href attributes, but, say, URL-s generated on-the-fly by a script…). I
> think we should keep away from that approach.
>
> 3. The third approach is somewhere in between. Resources like
> leonardo.html should not be changed. Instead, the offline version of the
> PWP would include an information on the
>
> http://www.louvre.org/monalisa -> https://pwp.server.com/publication1/f01
>
> mapping, and the PWP processor should work, roughly, through "try the
> first URI, if it is unreachable, try the second one" (or something like
> that). Note that, conceptually, this is "simply" surfacing what the caching
> behavior would have to implement. Obviously, one place to put this
> information is the manifest.
>
> But who would have to create this information? The author? Well, I do not
> think that is realistic: can we expect, to use Nick's example, the
> Wikipedia people to add this type of information to all Wiki pages? I do
> not think this will happen.
>
> Does anybody sees another alternative?
>
> Ivan
>
>
>
>
> This is really just a manifest – a list of locators. Online, that is a
> method of organizing websites, not too different from what we see on the
> web today, just a little more structured. To get to the offline state,
> something has to happen to put all the things at the other end of the URLs
> in a package. So, we arrive again at the importance of the manifest.
>
>
> (Maybe this is what all those people who were talking about books as APIs
> should have meant 3 years ago when I was rolling my eyes because I was
> tired of the term.)
>
> *Tzviya Siegman*
> Digital Book Standards & Capabilities Lead
> Wiley
> 201-748-6884
> tsiegman@wiley.com
>
> *From:* Charles LaPierre [mailto:charlesl@benetech.org
> <charlesl@benetech.org>]
> *Sent:* Monday, January 04, 2016 3:45 PM
> *To:* Nick Ruffilo
> *Cc:* DPUB mailing list (public-digipub-ig@w3.org)
> *Subject:* Re: Musings on PWP Offline/Online Modes
>
> I like this idea Nick, especially the part about
>
>
> This could have many benefits.  Imagine that there are a bunch of
> scholarly publications that all reference a single image/diagram.  The
> web-based PWP version can reference a single online canonical URL, whereas
> the offline PWP can have it's own local instance (meaning less duplication,
> and the ability to update all the online PWPs at once if there is an update
> to that image.  This is OPTIONAL, so if someone wanted to do a snapshot,
> they just reference a local image.
>
>
>
> Now lets say there is are extended descriptions for this image, a 3D model
> of this image, and/or a Tactile representation of this image with a Tour
> description explaining what the tactile image is.  Now this is done only
> once and all PWP’s would point to this image with its attached extended
> descriptions.  The packager which would create the offline version could
> also grab these extended descriptions as well.  Custom Elements could be
> used here to interact with these alternative representations of the image.
>
> Thanks.
>
>
> Charles LaPierre
> Sr. Software Engineer
> charlesl@benetech.org
>
>
>
> On Jan 4, 2016, at 9:28 AM, Nick Ruffilo <nickruffilo@gmail.com> wrote:
>
>
>
>
> The conversation today got me thinking - and maybe it's the new year
> crazies, but I got to thinking of the true value of having something of a
> PWP "engine" that would provide unique value.  Below are some use cases and
> what I feel is an interesting way to handle those cases:
>
> *The "vanilla" fully-offline package*
> This is probably closest to what epub is today.  All the files for the PWP
> are located in the same base, and besides the occasional <a href=""> link
> that points to an external resource, all items are contained within a
> package.  With little effort, the package can exist on a server and as long
> as there is a reading system that can handle the manifest, the content can
> be read in a linear or whatever method we end up with.
>
> I think we're all in agreement here - ignoring word choice like manifest,
> etc.
>
>
> *The web-page-in-a-box*
> Fonts live on other servers, images live on other servers, CSS Frameworks
> live on a CDN, It's a beautiful (messy) web.  How does this become
> offline?  This would require heavy lifting on the part of the browser or
> the server (whatever generates the document) but imagine if the packager
> could take these resources offline.
>
> *Example*: I'm reading a wikipedia article, and I want to download it as
> a PWP.  Wikipedia could specify a list of resources (heck, even a
> hyper-minified version of their CSS) as well as all the images related to
> that Wikipedia article.  All of those get packaged into a PWP that I can
> download and read whenever.  YES IT WILL BE A SNAPSHOT of the page at that
> time, but that isn't necessarily a bad thing...  It could even have update
> instructions (or an update URL).
>
> External resources get added to the root path in some way like:
> /http/somedomaincom/path/to/external/file.css
>
> This could have many benefits.  Imagine that there are a bunch of
> scholarly publications that all reference a single image/diagram.  The
> web-based PWP version can reference a single online canonical URL, whereas
> the offline PWP can have it's own local instance (meaning less duplication,
> and the ability to update all the online PWPs at once if there is an update
> to that image.  This is OPTIONAL, so if someone wanted to do a snapshot,
> they just reference a local image.
>
> For publishers - they could have a common CSS framework that they could
> keep up-to-date, so that if they found a bug, or decided that they wanted
> body color to be bright orange, they could update it once, and all new
> offline PWPs that are generated get that.
>
> Since this is 100% optional, those who wanted full control can simply opt
> to create their content fully within a single root.  The ability to be able
> to specify certain online resources to be "critical" to an offline package
> could create production benefits (and yes, I realize it could also create
> some headaches).
>
>
>
> --
> - Nick Ruffilo
> @NickRuffilo
> http://Aerbook.com <http://aerbook.com/>
> http://twitch.tv/TheWizardLlewyn
> http://ZenOfTechnology.com <http://zenoftechnology.com/>
>
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
>
>
>
>
>


-- 
- Nick Ruffilo
@NickRuffilo
http://Aerbook.com
http://twitch.tv/TheWizardLlewyn
http://ZenOfTechnology.com <http://zenoftechnology.com/>
Received on Tuesday, 5 January 2016 14:27:01 UTC