Re: Rough sketch for WP, was Re: Dereferencing, was Re: Jotting down some discussion topics from Ivan Herman on 2016-09-22 (public-digipub-ig@w3.org from September 2016)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 22 Sep 2016 06:17:32 +0100
To: Marcos Caceres <marcos@marcosc.com>
Cc: Peter Krautzberger <peter.krautzberger@mathjax.org>, Michael Smith <mike@w3.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>, Dave Cramer <dave.cramer@hbgusa.com>
Message-Id: <FD40F704-CD16-4C19-A050-C7F7C1D47F4A@w3.org>
Hey Marcos,

just a few comments; Dave has much more detailed knowledge for the details, though.

> On 21 Sep 2016, at 06:18, Marcos Caceres <marcos@marcosc.com> wrote:
> 
> On September 21, 2016 at 1:37:59 PM, Ivan Herman (ivan@w3.org) wrote:
>> Terminology issues, I guess… (I hope!). We still have to define what response the server
>> would return on a URL for a WP, right (in terms of mime type, etc).
> 
> We don't. It's just HTML. We don't need to define anything else. WP
> are not a concrete thing: they are just web applications that want to
> be displayed inside browsers in some particular way (i.e., the webview
> has a slightly different set of UI buttons... but it's still just a
> browser). Some WPs will just want the standard browser toolbar...
> others may request full screen, and maybe an orientation lock, etc.
> It's up to the publication/application - and this would be done via
> web manifest (or the appropriate low-level API).
> 
> Also, there are two classes of web applications that we need to cater for:
> 
>  1. Libraries: like Safari Books, an academic journal website, or a
> magazine (current and back issues, like the Economist) - those are
> applications that allow access to 1 to many "publications". The
> solution must cater for switching in and out of the particular display
> mode.

Oh absolutely. It has always been my dream to drag scholarly publishing more into the 21st century; what is out there is a disgrace. There are lots of social issues, though; some of these came up during an ad-hoc meeting yesterday

https://www.w3.org/2016/09/21-collection-minutes.html

> 
>  2. Standalone publication: a website that is itself "a book" or
> similar that wants this special UI (which the user selects and has
> full control switching in and out of!).

Yep.

> 
>>  If I use a URL for a HTML
>> or an SVG page, and I issue a HTTP GET, the server would return the corresponding mime type.
>> The same should be known for the WP case.
> 
> That's handled by fetch. We don't need to do or define anything.

From the client side: yes. But what the publisher should produce, how the server is set up: I presume that has to be defined; what does it mean to 'publish' a book. After all, it is the publisher's job to make a content correspond to a URL.

That being said, I do take your point below that it is better if the URL refers to a "well known" file type for a browser, ie, not a JSON file in the wild.

> 
> (Also, it's not even worth talking about SVG being served as an
> application: No one does that, so let's not even bother talking about
> it. Let's focus on the 99.999% case, which is HTML - SVG is an image
> format embedded in HTML.)

Well… we may have to be careful here. An SVG document can be used as the same 'top level' document as HTML in EPUB.

There is a large market for using full screen SVG-s in publishing, unrelated to an HTML content, namely cartoons/mangas. Mangas are huge in Japan (I do not have the exact figures, but afaik, for some Japanese digital book publishers mangas represent the majority of their income), other types of cartoons have a significant market in a number of countries like France or Belgium.

That being said, I do not know whether those books are using SVG as a standalone content, or whether they are embedded in an otherwise empty HTML. Somebody on the list might know. But, at this moment, we should not dismiss SVG to be on par with HTML at least in this area.

(There are, actually, very SVG specific issues that are raised by these applications. But that is for another day…)


> 
>> (What I would probably expect is that the return would be something like an (extended)
>> Web Manifest, or a (HTML) page with a reference to a manifest somewhere. But that is to
>> be defined.)
> 
> It would be a HTML page with a link rel manifest in it.

Or, I presume, a LINK header in the HTTP response. For example we can imagine libraries preferring to set up their alternative manifest for a publication (eg, a different, library specific unique id or other metadata) but not having the right to change the content of the publication. Using a LINK header is a good way of doing so.

> The manifest
> need not be "extended" - if we, the web community, work together, we
> can get everything standardized.

One can always hope:-)

But, more seriously, there may be a need for very publication specific things. See below.

> 
> It would NOT be a manifest: that would break the web for users and
> would not degrade gracefully (e.g., in a non-supporting user agent).
> Thus, we should never pass around URLs that dereference to some form a
> user can't work with. We, humans, only share URLs that dereference to
> HTML. A supporting user agent would then pick up the link rel=manifest
> and do the right thing.

As I said, I see the point.

> 
>> I seem to be absolutely old skool here, but what would be, in your view, the right terminology?
> 
> Don't fret about terminology (I have no idea about it either, so let's
> try to avoid fancy jargon and focus on simple concepts)... I think we
> are all still all percolating what this will look like, but my **very
> rough** sketch:
> 
> 1. A WP is a web app whose manifest optionally has its "display" mode
> set to "publication". This allows the browser to offer a
> publication-specific set of UI controls to the end-user (the ones we
> know and love from ebook readers: page numbers, switch between
> dark/night mode, maybe the browser also changes the dimming timeout,
> etc). The user would switch into this mode, as they do today in, for
> example, Safari's reader mode - or by "installing" these
> "publications" into the browser (similar to bookmarking, but purpose
> built for publications)... see also how "progressive web apps" are
> installed, same thing.
> 

Sounds good...

> 2. A WP optionally includes metadata that users would want to find
> these things on... this set would be extremely limited at first and
> there would need to be precedence for this, so maybe only author and
> category would make the cut! Though category is dubious because it
> doesn't internationalize well (so it's pretty garbage). I'm still
> somewhat skeptical if "id" would make the cut (e.g., {type: "ISBN",
> id: "..."}), as ISBN, etc. can be included into the actual HTML of the
> publication.

Because the publication is not one HTML but, potentially, many, I think such an identifier should be in the manifest.

We have to be careful what we mean by 'limited' metadata. I agree that adding lots of metadata into the manifest file would be a mistake (there is a limited set of metadata, mostly derived from Dublin Core, as part of the EPUB 'package' definition, too, we should look at that). However, the publication world lots of metadata, related to many different things (provenance, marketing facts, copyright, you name it). Some of these metadata specifications (like ONIX) are huge and, unfortunately, if we take into account the metadata used by trade publishers, libraries, scholarly publishers, magazines, etc, then the "one standard is good, more is better" approach seems to prevail:-) But the important point is: metadata handling, definition, usage, etc, is a hugely important aspect of the business. (As an example, when you look at a page on a book on a site like Amazon, all the data you see there comes, afaik, from the metadata that is provided by the publishers of those books. The distributors, I presume, rarely do that by themselves, and surely not manually.)

What this means is that there should be a slot (and I think that _is_ very publication specific, I do not expect that to make all that much sense for manifest in general) in the manifest that would refer to an external file (or probably files) containing the detailed metadata. The manifest would be silent as for the format of those files (XML, JSON, specific formats like BibTex, Turtle,…); that should be really the job of specialized consumers.

B.t.w., it is conceivable that some of these metadata would be embedded into a content HTML file (eg, adding a JSON-LD content into a <script> tag), but they may be way too large to make this practical.

> 
> 3. A WP would have a (likely) Service Worker API that allows the apps
> to optionally say, "this object hierarchy represents the related
> documents - and how each should be represented in the ToC".
> 
> 4. A WP would have a (likely) Service Worker API to indicate which
> resources are searchable - probably as part of 4, to create the book
> search index.

I must admit that when I read these two items yesterday, I did not really understand it. However, on the aforementioned meeting yesterday afternoon Kenneth referred to SW API-s as a new entry to the manifest spec. I will have to read those but, details aside, I think we are on the same page here.

> 
> 5. A WP can be sync'ed across multiple devices via the forthcoming
> manifest "service_worker" member. This is just a normal service worker
> that handles all the synchronization of offline content, d/ls
> annotations to put into IndexedDB, etc. and whatever other things need
> to happen to get everything into the right synchronized state (e.g.,
> matching document location).

I have only a rough understanding of what I read here:-) but, I guess, this is perfectly fine for now. We are in a rough brainstorming phase anyway…

Just for my understanding, though: would that mean that the WP's manifest would carry effective Javascript code, or that it would contain information that a generic code in the browser would use? Kenneth referred to the possibility that the manifest would list all the resources in the WP that a service worker should 'check in' at startup; that seems to refer to the latter.


> 
> That's it for now - what am I missing?

That is where our friends on the list who have a much deeper knowledge on EPUB should comment…


> I'd love to see other short
> rough sketches of what people are thinking…

Dave had an experimental setup with a few books (obviously, Moby Dick among them; that is be the 'hello world' of the digital book world:-). This was based on a small SW implementation that Jake Archibald did last year after a discussion at last year's TPAC, but has to be refreshed. I think he and Kenneth agreed to look into this. It would make things more tangible…

Thanks again!

Cheers

Ivan



----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Thursday, 22 September 2016 05:17:51 UTC