Re: Rough sketch for WP, was Re: Dereferencing, was Re: Jotting down some discussion topics from Bill McCoy on 2016-09-21 (public-digipub-ig@w3.org from September 2016)

From: Bill McCoy <whmccoy@gmail.com>
Date: Wed, 21 Sep 2016 13:42:47 +0100
To: Mike Perlman <perlmanm@me.com>
Cc: W3C Digital Publishing IG <public-digipub-ig@w3.org>, Bill McCoy <bmccoy@idpf.org>, Marcos Caceres <marcos@marcosc.com>, "Cramer, Dave" <Dave.Cramer@hbgusa.com>
Message-ID: <CAJ0DDbCOvO=0v0psU_cCKbMAQcGagBKU2YbM=UsC2pYauX+S+Q@mail.gmail.com>
I hope "it" is not a separate language or format but just a way to
structure HTML5, imposing try absolute minimum. So to me the name HPDL
could nudge us down a wrong track. HyperPub could be ok but to the Web
community I'm afraid Hyper connotes "hypertext" more than "HyperCard" and
that is part and parcel of HTML and the Web in general not just a document
specialization thereof.

--Bill
On Sep 21, 2016 1:05 PM, "Mike Perlman" <perlmanm@me.com> wrote:

> Regarding the name I suggest:
>
> HyperPub or HPDL - Hyper Publication Document Language
>
> This links its evolution to Hypercard, conceptuality to the modularity of
> HTML5, the paradigm to an entity w/o defining the location.
>
> On 21 Sep 2016, at 01:21, Bill McCoy <bmccoy@idpf.org> wrote:
>
> I support Dave's direction but wanted to throw out two possible
> generalizations, one terminological and one substantive.
>
> On Wed, Sep 21, 2016 at 10:36 AM, Cramer, Dave <Dave.Cramer@hbgusa.com>
> wrote:
>
>> Hi Marcos,
>>
>> Apologies for top-posting, but it seemed to be the format that suited my
>> content :)
>>
>> First of all, I wanted to thank you for your contributions to this
>> discussion. This is exactly what the publishing community (or at least
>> me!) has hoped for over the last three years‹full engagement from the web
>> community. This is awesome!
>>
>> For quite a few years, some of us have been kicking around a set of ideas
>> which I've called ³EPUB Zero². Having worked with ebooks for fifteen
>> years, the complexity and insane duplication of content in EPUB have
>> driven me mad. So I¹ve been wondering what the simplest possible ebook
>> format might look like. The goal was to use only HTML, instead of all the
>> custom XML vocabularies in EPUB.
>>
>> A publication is, at its core, a bounded sequence of content documents.
>> How could we express such an idea in HTML? It¹s almost like we need an
>> ordered list of links to documents. Hmmm. If only there were an HTML
>> structure that was an ordered list of linksŠ wait, there is!
>>
>
> I think we should stop getting stuck on terminology by agreeing that the
> term "Web Publication" is as per Dave's implicit definition here and
> describes any specialization of an arbitrary web of interlinked content
> (regardless of how such bounded sequence is expressed). For example a set
> of content documents sequenced via link rel="next"/"prev" is also a "Web
> Publication". A single HTML file with a structure that can be inferred from
> the outline as well (as Mike Smith was discussing about yesterday). I think
> we cannot credibly claim that "Web Publication can apply only if some
> specific expression syntax or requirement is met. Yes this means that a
> "Web Publication" in the general sense may not have its sequence (TOC)
> reliably or conveniently discoverable, may as a result not be ideally
> accessible, etc. but this is just a fact.
>
> And, so, we need a more specific name for what Dave later proposes
> (whether that's EPUB0, EPUB 4, PWP, or whatever is only a detail; I will
> use "PWP" below). And of course we can quibble also about whether it's
> really a "sequence" vs. a "tree" or some other graph structure - to me the
> requirement is only that there is a more specific structure relating the
> content documents than just a bag, and from such structure one could impute
> a linear order, but that's a detail.
>
> EPUB has always required a navigation document, as it was deemed to be
>> critical for accessibility, and EPUB (to the benefit of all of us) has
>> always been very focused on accessibility. And so a table of contents, in
>> the form of a HTML document with a nav element, could serve to define the
>> sequence of content documents. And it would have major advantages over a
>> TOC generated from content (a subject for another email).
>>
>
> I think this is fine but for generality there is no reason to assume or
> restrict that the nav element is its own separate HTML document. It could
> be so (as is usual in EPUB 3), and referenced by e.g. link "rel=toc" or
> "rel=nav" or whatever from actual content, but it could as easily be within
> the actual content document, especially in the important special case that
> the publication content is a single HTML document then it is overkill to
> insist upon.
>
>
>>
>> Such a nav doc could contain the link to the web app manifest.
>
>
> It could, but this seems orthogonal. Web Publication-ness in general, and
> PWP-ness in particular, should not be conflated with offline-usability
> (which applies as well to Web content that is not document-like in nature).
>
> So to me we can simply say that any Web Publication which includes or
> references in a predefined discoverable way (TBD) a designated navigation
> element that provides a complete ordered "toc" that comprises its
> structure, is a "PWP".
>
> (for the TBD part, I observe that the simple requirement for the
> designated nav element to have id="toc" would have the charm of making PWP
> fully EPUB compatible.)
>
> A User Agent could then (whether through built-in functionality,
> extension, or polyfill delivered with the content as a backstop) reliably
> locate the "toc nav" (as it's called in EPUB) and provide an augmented
> reading experience in a similar but more reliable way than for example
> Chrome Reading Mode.
>
> It could be
>> used as a TOC in a browser reading mode optimized for publications (as
>> defined in the manifest). It could be the container for metadata that
>> applies to the publication as a whole. It would allow easy access to any
>> part of the publication, even in the absence of any more sophisticated
>> code in the browser, manifest, or service worker. It¹s the ultimate
>> fallback.
>>
>> Dave
>>
>>
>> On 9/21/16, 6:18 AM, "Marcos Caceres" <marcos@marcosc.com> wrote:
>>
>> >On September 21, 2016 at 1:37:59 PM, Ivan Herman (ivan@w3.org) wrote:
>> >> Terminology issues, I guessŠ (I hope!). We still have to define what
>> >>response the server
>> >> would return on a URL for a WP, right (in terms of mime type, etc).
>> >
>> >We don't. It's just HTML. We don't need to define anything else. WP
>> >are not a concrete thing: they are just web applications that want to
>> >be displayed inside browsers in some particular way (i.e., the webview
>> >has a slightly different set of UI buttons... but it's still just a
>> >browser). Some WPs will just want the standard browser toolbar...
>> >others may request full screen, and maybe an orientation lock, etc.
>> >It's up to the publication/application - and this would be done via
>> >web manifest (or the appropriate low-level API).
>> >
>> >Also, there are two classes of web applications that we need to cater
>> for:
>> >
>> > 1. Libraries: like Safari Books, an academic journal website, or a
>> >magazine (current and back issues, like the Economist) - those are
>> >applications that allow access to 1 to many "publications". The
>> >solution must cater for switching in and out of the particular display
>> >mode.
>> >
>> > 2. Standalone publication: a website that is itself "a book" or
>> >similar that wants this special UI (which the user selects and has
>> >full control switching in and out of!).
>> >
>> >>  If I use a URL for a HTML
>> >> or an SVG page, and I issue a HTTP GET, the server would return the
>> >>corresponding mime type.
>> >> The same should be known for the WP case.
>> >
>> >That's handled by fetch. We don't need to do or define anything.
>> >
>> >(Also, it's not even worth talking about SVG being served as an
>> >application: No one does that, so let's not even bother talking about
>> >it. Let's focus on the 99.999% case, which is HTML - SVG is an image
>> >format embedded in HTML.)
>> >
>> >> (What I would probably expect is that the return would be something
>> >>like an (extended)
>> >> Web Manifest, or a (HTML) page with a reference to a manifest
>> >>somewhere. But that is to
>> >> be defined.)
>> >
>> >It would be a HTML page with a link rel manifest in it. The manifest
>> >need not be "extended" - if we, the web community, work together, we
>> >can get everything standardized.
>> >
>> >It would NOT be a manifest: that would break the web for users and
>> >would not degrade gracefully (e.g., in a non-supporting user agent).
>> >Thus, we should never pass around URLs that dereference to some form a
>> >user can't work with. We, humans, only share URLs that dereference to
>> >HTML. A supporting user agent would then pick up the link rel=manifest
>> >and do the right thing.
>> >
>> >> I seem to be absolutely old skool here, but what would be, in your
>> >>view, the right terminology?
>> >
>> >Don't fret about terminology (I have no idea about it either, so let's
>> >try to avoid fancy jargon and focus on simple concepts)... I think we
>> >are all still all percolating what this will look like, but my **very
>> >rough** sketch:
>> >
>> >1. A WP is a web app whose manifest optionally has its "display" mode
>> >set to "publication". This allows the browser to offer a
>> >publication-specific set of UI controls to the end-user (the ones we
>> >know and love from ebook readers: page numbers, switch between
>> >dark/night mode, maybe the browser also changes the dimming timeout,
>> >etc). The user would switch into this mode, as they do today in, for
>> >example, Safari's reader mode - or by "installing" these
>> >"publications" into the browser (similar to bookmarking, but purpose
>> >built for publications)... see also how "progressive web apps" are
>> >installed, same thing.
>> >
>> >2. A WP optionally includes metadata that users would want to find
>> >these things on... this set would be extremely limited at first and
>> >there would need to be precedence for this, so maybe only author and
>> >category would make the cut! Though category is dubious because it
>> >doesn't internationalize well (so it's pretty garbage). I'm still
>> >somewhat skeptical if "id" would make the cut (e.g., {type: "ISBN",
>> >id: "..."}), as ISBN, etc. can be included into the actual HTML of the
>> >publication.
>> >
>> >3. A WP would have a (likely) Service Worker API that allows the apps
>> >to optionally say, "this object hierarchy represents the related
>> >documents - and how each should be represented in the ToC".
>> >
>> >4. A WP would have a (likely) Service Worker API to indicate which
>> >resources are searchable - probably as part of 4, to create the book
>> >search index.
>> >
>> >5. A WP can be sync'ed across multiple devices via the forthcoming
>> >manifest "service_worker" member. This is just a normal service worker
>> >that handles all the synchronization of offline content, d/ls
>> >annotations to put into IndexedDB, etc. and whatever other things need
>> >to happen to get everything into the right synchronized state (e.g.,
>> >matching document location).
>> >
>> >That's it for now - what am I missing? I'd love to see other short
>> >rough sketches of what people are thinking...
>> >
>>
>> This may contain confidential material. If you are not an intended
>> recipient, please notify the sender, delete immediately, and understand
>> that no disclosure or reliance on the information herein is permitted.
>> Hachette Book Group may monitor email to and from our network.
>>
>>
>
>
> --
>
> Bill McCoy
> Executive Director
> International Digital Publishing Forum (IDPF)
> email: bmccoy@idpf.org
> mobile: +1 206 353 0233
>
>
>
Received on Wednesday, 21 September 2016 12:43:24 UTC