Re: Rough sketch for WP, was Re: Dereferencing, was Re: Jotting down some discussion topics from Marcos Caceres on 2016-09-22 (public-digipub-ig@w3.org from September 2016)

From: Marcos Caceres <marcos@marcosc.com>
Date: Wed, 21 Sep 2016 23:44:50 -0700
To: Ivan Herman <ivan@w3.org>
Cc: Michael Smith <mike@w3.org>, Dave Cramer <dave.cramer@hbgusa.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>, Peter Krautzberger <peter.krautzberger@mathjax.org>
Message-ID: <CAAci2aC=UNog5mxmPTczhE9akG5WKHDydfS-A+XeLdjWQTHNhA@mail.gmail.com>
On September 22, 2016 at 3:17:33 PM, Ivan Herman (ivan@w3.org) wrote:
> Hey Marcos,
>
> >> If I use a URL for a HTML
> >> or an SVG page, and I issue a HTTP GET, the server would return the corresponding mime
> type.
> >> The same should be known for the WP case.
> >
> > That's handled by fetch. We don't need to do or define anything.
>
> From the client side: yes. But what the publisher should produce, how the server is set
> up: I presume that has to be defined; what does it mean to 'publish' a book. After all, it
> is the publisher's job to make a content correspond to a URL.

Fortunately, it doesn't need to be define (just like HTML doesn't care
what server you use to serve HTML to the client).

Server-side processes will come and go: some will use Apache + PHP,
others Node and ReactJS... it doesn't matter. That's what different
publishers will compete on: providing fast responses and delightful
user experiences... but that's outside the scope.

> >
> > (Also, it's not even worth talking about SVG being served as an
> > application: No one does that, so let's not even bother talking about
> > it. Let's focus on the 99.999% case, which is HTML - SVG is an image
> > format embedded in HTML.)
>
> Well… we may have to be careful here. An SVG document can be used as the same 'top level'
> document as HTML in EPUB.

Yes, you could do the same on the Web - but no one in their right mind
would do that. That would be crazy.

> There is a large market for using full screen SVG-s in publishing, unrelated to an HTML
> content, namely cartoons/mangas. Mangas are huge in Japan (I do not have the exact figures,
> but afaik, for some Japanese digital book publishers mangas represent the majority
> of their income), other types of cartoons have a significant market in a number of countries
> like France or Belgium.

Sure, but are those linked HTML files or SVG files? I'm going to go
out on limb and say they are HTML files with embedded SVG images.

> That being said, I do not know whether those books are using SVG as a standalone content,
> or whether they are embedded in an otherwise empty HTML. Somebody on the list might know.
> But, at this moment, we should not dismiss SVG to be on par with HTML at least in this area.

Ok... I've been way wrong before... and know nothing of this space...
so, sure, proof needed here.

But the burden of proof is on publishers and we should assume it's
false until evidence proofs otherwise.

> (There are, actually, very SVG specific issues that are raised by these applications.
> But that is for another day…)

Agree... and hopefully SVG.next will fix some of those issues.


> Or, I presume, a LINK header in the HTTP response.

This is currently not supported. We had it in the manifest spec a very
long time ago, but took it out. Only Firefox has ever supported Link:
stylesheet, for instance. There are few other specs starting to
experiment with Link: headers... but Link: hasn't really been a thing
on the Web (setting headers is hard).

>  For example we can imagine libraries
> preferring to set up their alternative manifest for a publication (eg, a different,
> library specific unique id or other metadata) but not having the right to change the content
> of the publication. Using a LINK header is a good way of doing so.

Yep. The use cases are compelling... but, browser support over the
years, plus the challenge of serving HTTP headers makes this... not so
appealing right now.


> > 2. A WP optionally includes metadata that users would want to find
> > these things on... this set would be extremely limited at first and
> > there would need to be precedence for this, so maybe only author and
> > category would make the cut! Though category is dubious because it
> > doesn't internationalize well (so it's pretty garbage). I'm still
> > somewhat skeptical if "id" would make the cut (e.g., {type: "ISBN",
> > id: "..."}), as ISBN, etc. can be included into the actual HTML of the
> > publication.
>
> Because the publication is not one HTML but, potentially, many, I think such an identifier
> should be in the manifest.

I'm still not sure who benefits from the identifier in the manifest
(the manifest content should only benefit the end-user through the
user agent)? Why can't it just be in the HTML? Search engines
(including Google Scholar, etc.) know how to find these things
already.

Put differently, can anyone show:

* how I get to the ISBN of a book today downloaded in an eBook reader today?
* how an end-user would then use this identifier from within an eBook reader?

(these are honest question, I don't know... I've only used iBooks and
an old Kindle)

If the answer to the above is no (or the answer is, "they copy/paste
it from one of the pages"), then identifier is kinda useless in the
manifest as it doesn't need to be surfaced by the user agent.

> We have to be careful what we mean by 'limited' metadata. I agree that adding lots of metadata
> into the manifest file would be a mistake (there is a limited set of metadata, mostly derived
> from Dublin Core, as part of the EPUB 'package' definition, too, we should look at that).
> However, the publication world lots of metadata, related to many different things (provenance,
> marketing facts, copyright, you name it). Some of these metadata specifications (like
> ONIX) are huge and, unfortunately, if we take into account the metadata used by trade
> publishers, libraries, scholarly publishers, magazines, etc, then the "one standard
> is good, more is better" approach seems to prevail:-)

We don't need to worry about those... we only need to worry about what
benefits end-users (and whatever can live in the publication as HTML
just lives in the HTML... like, say, copyright).

> But the important point is: metadata
> handling, definition, usage, etc, is a hugely important aspect of the business. (As
> an example, when you look at a page on a book on a site like Amazon, all the data you see there
> comes, afaik, from the metadata that is provided by the publishers of those books. The
> distributors, I presume, rarely do that by themselves, and surely not manually.)

Sure, but it's not of relevance to end users. I'll again echo Mike
Smith to keep this end-user focused and focused on the browser needs
to process and work with to provide a greats user experience (again,
look at iBooks or the Kindle, for instance... it doesn't display any
such metadata - just provide a great reading experience).

And for those wanting to surface metadata for, say, a specialized
community, they can just do that using
fetch("metadata.xml").then(displayItUsingHTML).

> What this means is that there should be a slot (and I think that _is_ very publication specific,
> I do not expect that to make all that much sense for manifest in general) in the manifest
> that would refer to an external file (or probably files) containing the detailed metadata.
> The manifest would be silent as for the format of those files (XML, JSON, specific formats
> like BibTex, Turtle,…); that should be really the job of specialized consumers.

If it's application/publication specific - it could just be in an
external file... no need for it to be in the manifest: the manifest is
only concerned with things the browser can understand and work with.
If the browser can't process it, it should not be there.

> B.t.w., it is conceivable that some of these metadata would be embedded into a content
> HTML file (eg, adding a JSON-LD content into a  <script> tag), but they may be way too
> large to make this practical.

That's probably a pretty strong indicator that a lot of that metadata
might not be of value to end-users, and thus should not be shipped
with a publication.

> Just for my understanding, though: would that mean that the WP's
> manifest would carry effective Javascript code,

No. It's just json.

>  or that it would
> contain information that a generic code in the browser would
> use? Kenneth referred to the possibility that the manifest would
> list all the resources in the WP that a service worker should 'check
> in' at startup; that seems to refer to the latter.

Absolutely no. The manifest is always as simple as humanly possible
and would never contain any such listing.

Further, the Service Worker is specifically designed to NEVER do
anything on its own: it's a simple event catcher. The developer would
list resources inside the service worker's script - never in the
manifest.

Rule of thumb: the browser or SW will never do any work that the
developer can do on their own.  If you ever think "the browser
could..." or "the service worker could"... then just stop... and
rephrase it as, "a developer would".

If it's impossible for the developer to do something, for privacy or
security reasons, then we can talk about "the SW or Browser could..."
- but never otherwise.

> > I'd love to see other short
> > rough sketches of what people are thinking…
>
> Dave had an experimental setup with a few books (obviously, Moby
> Dick among them; that is be the 'hello world' of the digital book
> world:-). This was based on a small SW implementation that Jake
> Archibald did last year after a discussion at last year's TPAC,
> but has to be refreshed. I think he and Kenneth agreed to look into
> this. It would make things more tangible…

Dave, please put it up on GH :)  What are you waiting for!?!!?!11one!!
Received on Thursday, 22 September 2016 06:45:22 UTC