W3C home > Mailing lists > Public > public-digipub-ig@w3.org > September 2016

Re: Rough sketch for WP, was Re: Dereferencing, was Re: Jotting down some discussion topics

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Wed, 21 Sep 2016 20:23:35 +0000
To: Bill McCoy <whmccoy@gmail.com>, Mike Perlman <perlmanm@me.com>
CC: W3C Digital Publishing IG <public-digipub-ig@w3.org>, Bill McCoy <bmccoy@idpf.org>, Marcos Caceres <marcos@marcosc.com>, "Cramer, Dave" <Dave.Cramer@hbgusa.com>
Message-ID: <118B012D-7D50-44D1-9DC0-7130AEA416D5@adobe.com>
I agree that for web publications – our goal should indeed be to figure out what the right pieces of the OWP are (beyond the obvious of HTML, CSS, JS, etc.) in order to address as many of our use cases as we can.  I believe that this will include some of the web & origin manifests specifications, services workers, etc.  This collection of technologies is what authors/publishers will use to be able to publish material in an “unpacked” form and for them to work natively in browsers.  (NOTE: I am assuming that there will exist an “RS”, in the form of other OWP materials, that will handle the UX)

Our secondary goal will be to figure out what is necessary to “package up” some or all of the same collection of material into a “file system compatible” format that be exchanged “off the web.”  And for this, we also have a set of use cases that will need to be addressed and this is where – I believe – more of the technical work will take place as we are entering into areas not currently part of the OWP.


From: Bill McCoy <whmccoy@gmail.com>
Date: Wednesday, September 21, 2016 at 3:00 PM
To: Mike Perlman <perlmanm@me.com>
Cc: W3C Digital Publishing IG <public-digipub-ig@w3.org>, Bill McCoy <bmccoy@idpf.org>, Marcos Caceres <marcos@marcosc.com>, "Cramer, Dave" <Dave.Cramer@hbgusa.com>
Subject: Re: Rough sketch for WP, was Re: Dereferencing, was Re: Jotting down some discussion topics
Resent-From: <public-digipub-ig@w3.org>
Resent-Date: Wednesday, September 21, 2016 at 3:01 PM

I don't want to be bikeshedding here but I think it's an important point.

One criticism of EPUB is that it built too much superstructure on top of HTML.

We want to as much as possible explore the other end of that continuum - what is the minimum we need, beyond the HTML5 /OWP we already have, to meet the use cases identified for portable documents on the Web?

In this regard I'm with Dave Cramer - we don't need much and shouldn't add much.

So the "language" is just HTML5 which as a brand already signifies much more than just a "markup language".  And from the beginning (you know, Tim's physics papers) it was a document language, not (especially at the beginning) just a way to preload the initial facade for an app running in a JavaScript VM.

I could see using "WebPub" (for things conformant with some future specification for the *minimum* exra stuff needed) as long as we also accept that "web publication" (lower case) has a broader meaning and can be implemented in a variety of ways, not just whatever way the WebPub spec mandates.

Again I don't see that a WebPub spec has to necessarily do more than define reliable way to find and machine process a designated nav element, basically as Dave Cramer proposes. everything else (like manifests) is optional / should be defined more generally for the overall Web Platform (not just for WebPubs).


On Wed, Sep 21, 2016 at 2:06 PM, Mike Perlman <perlmanm@me.com<mailto:perlmanm@me.com>> wrote:
I wanted to offer suggestions to help the discussion.

I think “Language” is an open ended concept. Basically it’s what’s defined by the spec.
The HTML “language" encompasses many different component languages that are not always all used.
The HTML “language” has evolved to define abstract markup relationships.

My thought was this could apply to a "document language”.
We have engaged in many discussions about needs and requirements.
But as I have seen it written, the idea is to build on HTML.
Add a “document language” on top of the "markup language”.

Regarding “hyper”, hyperpub’s linkage to hypertext makes sense. Evoking a “physical entity” by connecting it to hypercard is even better.


On 21 Sep 2016, at 02:42, Bill McCoy <whmccoy@gmail.com<mailto:whmccoy@gmail.com>> wrote:

I hope "it" is not a separate language or format but just a way to structure HTML5, imposing try absolute minimum. So to me the name HPDL could nudge us down a wrong track. HyperPub could be ok but to the Web community I'm afraid Hyper connotes "hypertext" more than "HyperCard" and that is part and parcel of HTML and the Web in general not just a document specialization thereof.

On Sep 21, 2016 1:05 PM, "Mike Perlman" <perlmanm@me.com<mailto:perlmanm@me.com>> wrote:
Regarding the name I suggest:

HyperPub or HPDL - Hyper Publication Document Language

This links its evolution to Hypercard, conceptuality to the modularity of HTML5, the paradigm to an entity w/o defining the location.

On 21 Sep 2016, at 01:21, Bill McCoy <bmccoy@idpf.org<mailto:bmccoy@idpf.org>> wrote:

I support Dave's direction but wanted to throw out two possible generalizations, one terminological and one substantive.

On Wed, Sep 21, 2016 at 10:36 AM, Cramer, Dave <Dave.Cramer@hbgusa.com<mailto:Dave.Cramer@hbgusa.com>> wrote:
Hi Marcos,

Apologies for top-posting, but it seemed to be the format that suited my
content :)

First of all, I wanted to thank you for your contributions to this
discussion. This is exactly what the publishing community (or at least
me!) has hoped for over the last three years‹full engagement from the web
community. This is awesome!

For quite a few years, some of us have been kicking around a set of ideas
which I've called ³EPUB Zero². Having worked with ebooks for fifteen
years, the complexity and insane duplication of content in EPUB have
driven me mad. So I¹ve been wondering what the simplest possible ebook
format might look like. The goal was to use only HTML, instead of all the
custom XML vocabularies in EPUB.

A publication is, at its core, a bounded sequence of content documents.
How could we express such an idea in HTML? It¹s almost like we need an
ordered list of links to documents. Hmmm. If only there were an HTML
structure that was an ordered list of linksŠ wait, there is!

I think we should stop getting stuck on terminology by agreeing that the term "Web Publication" is as per Dave's implicit definition here and describes any specialization of an arbitrary web of interlinked content (regardless of how such bounded sequence is expressed). For example a set of content documents sequenced via link rel="next"/"prev" is also a "Web Publication". A single HTML file with a structure that can be inferred from the outline as well (as Mike Smith was discussing about yesterday). I think we cannot credibly claim that "Web Publication can apply only if some specific expression syntax or requirement is met. Yes this means that a "Web Publication" in the general sense may not have its sequence (TOC) reliably or conveniently discoverable, may as a result not be ideally accessible, etc. but this is just a fact.

And, so, we need a more specific name for what Dave later proposes (whether that's EPUB0, EPUB 4, PWP, or whatever is only a detail; I will use "PWP" below). And of course we can quibble also about whether it's really a "sequence" vs. a "tree" or some other graph structure - to me the requirement is only that there is a more specific structure relating the content documents than just a bag, and from such structure one could impute a linear order, but that's a detail.

EPUB has always required a navigation document, as it was deemed to be
critical for accessibility, and EPUB (to the benefit of all of us) has
always been very focused on accessibility. And so a table of contents, in
the form of a HTML document with a nav element, could serve to define the
sequence of content documents. And it would have major advantages over a
TOC generated from content (a subject for another email).

I think this is fine but for generality there is no reason to assume or restrict that the nav element is its own separate HTML document. It could be so (as is usual in EPUB 3), and referenced by e.g. link "rel=toc" or "rel=nav" or whatever from actual content, but it could as easily be within the actual content document, especially in the important special case that the publication content is a single HTML document then it is overkill to insist upon.

Such a nav doc could contain the link to the web app manifest.

It could, but this seems orthogonal. Web Publication-ness in general, and PWP-ness in particular, should not be conflated with offline-usability (which applies as well to Web content that is not document-like in nature).

So to me we can simply say that any Web Publication which includes or references in a predefined discoverable way (TBD) a designated navigation element that provides a complete ordered "toc" that comprises its structure, is a "PWP".

(for the TBD part, I observe that the simple requirement for the designated nav element to have id="toc" would have the charm of making PWP fully EPUB compatible.)

A User Agent could then (whether through built-in functionality, extension, or polyfill delivered with the content as a backstop) reliably locate the "toc nav" (as it's called in EPUB) and provide an augmented reading experience in a similar but more reliable way than for example Chrome Reading Mode.

It could be
used as a TOC in a browser reading mode optimized for publications (as
defined in the manifest). It could be the container for metadata that
applies to the publication as a whole. It would allow easy access to any
part of the publication, even in the absence of any more sophisticated
code in the browser, manifest, or service worker. It¹s the ultimate


On 9/21/16, 6:18 AM, "Marcos Caceres" <marcos@marcosc.com<mailto:marcos@marcosc.com>> wrote:

>On September 21, 2016 at 1:37:59 PM, Ivan Herman (ivan@w3.org<mailto:ivan@w3.org>) wrote:
>> Terminology issues, I guessŠ (I hope!). We still have to define what
>>response the server
>> would return on a URL for a WP, right (in terms of mime type, etc).
>We don't. It's just HTML. We don't need to define anything else. WP
>are not a concrete thing: they are just web applications that want to
>be displayed inside browsers in some particular way (i.e., the webview
>has a slightly different set of UI buttons... but it's still just a
>browser). Some WPs will just want the standard browser toolbar...
>others may request full screen, and maybe an orientation lock, etc.
>It's up to the publication/application - and this would be done via
>web manifest (or the appropriate low-level API).
>Also, there are two classes of web applications that we need to cater for:
> 1. Libraries: like Safari Books, an academic journal website, or a
>magazine (current and back issues, like the Economist) - those are
>applications that allow access to 1 to many "publications". The
>solution must cater for switching in and out of the particular display
> 2. Standalone publication: a website that is itself "a book" or
>similar that wants this special UI (which the user selects and has
>full control switching in and out of!).
>>  If I use a URL for a HTML
>> or an SVG page, and I issue a HTTP GET, the server would return the
>>corresponding mime type.
>> The same should be known for the WP case.
>That's handled by fetch. We don't need to do or define anything.
>(Also, it's not even worth talking about SVG being served as an
>application: No one does that, so let's not even bother talking about
>it. Let's focus on the 99.999% case, which is HTML - SVG is an image
>format embedded in HTML.)
>> (What I would probably expect is that the return would be something
>>like an (extended)
>> Web Manifest, or a (HTML) page with a reference to a manifest
>>somewhere. But that is to
>> be defined.)
>It would be a HTML page with a link rel manifest in it. The manifest
>need not be "extended" - if we, the web community, work together, we
>can get everything standardized.
>It would NOT be a manifest: that would break the web for users and
>would not degrade gracefully (e.g., in a non-supporting user agent).
>Thus, we should never pass around URLs that dereference to some form a
>user can't work with. We, humans, only share URLs that dereference to
>HTML. A supporting user agent would then pick up the link rel=manifest
>and do the right thing.
>> I seem to be absolutely old skool here, but what would be, in your
>>view, the right terminology?
>Don't fret about terminology (I have no idea about it either, so let's
>try to avoid fancy jargon and focus on simple concepts)... I think we
>are all still all percolating what this will look like, but my **very
>rough** sketch:
>1. A WP is a web app whose manifest optionally has its "display" mode
>set to "publication". This allows the browser to offer a
>publication-specific set of UI controls to the end-user (the ones we
>know and love from ebook readers: page numbers, switch between
>dark/night mode, maybe the browser also changes the dimming timeout,
>etc). The user would switch into this mode, as they do today in, for
>example, Safari's reader mode - or by "installing" these
>"publications" into the browser (similar to bookmarking, but purpose
>built for publications)... see also how "progressive web apps" are
>installed, same thing.
>2. A WP optionally includes metadata that users would want to find
>these things on... this set would be extremely limited at first and
>there would need to be precedence for this, so maybe only author and
>category would make the cut! Though category is dubious because it
>doesn't internationalize well (so it's pretty garbage). I'm still
>somewhat skeptical if "id" would make the cut (e.g., {type: "ISBN",
>id: "..."}), as ISBN, etc. can be included into the actual HTML of the
>3. A WP would have a (likely) Service Worker API that allows the apps
>to optionally say, "this object hierarchy represents the related
>documents - and how each should be represented in the ToC".
>4. A WP would have a (likely) Service Worker API to indicate which
>resources are searchable - probably as part of 4, to create the book
>search index.
>5. A WP can be sync'ed across multiple devices via the forthcoming
>manifest "service_worker" member. This is just a normal service worker
>that handles all the synchronization of offline content, d/ls
>annotations to put into IndexedDB, etc. and whatever other things need
>to happen to get everything into the right synchronized state (e.g.,
>matching document location).
>That's it for now - what am I missing? I'd love to see other short
>rough sketches of what people are thinking...
This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network.


Bill McCoy
Executive Director
International Digital Publishing Forum (IDPF)
email: bmccoy@idpf.org<mailto:bmccoy@idpf.org>
mobile: +1 206 353 0233<tel:%2B1%20206%20353%200233>

Received on Wednesday, 21 September 2016 20:24:08 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 25 April 2017 10:44:45 UTC