Re: Rough sketch for WP, was Re: Dereferencing, was Re: Jotting down some discussion topics from Ivan Herman on 2016-09-22 (public-digipub-ig@w3.org from September 2016)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 22 Sep 2016 08:54:09 +0100
To: Marcos Caceres <marcos@marcosc.com>
Cc: Michael Smith <mike@w3.org>, Dave Cramer <dave.cramer@hbgusa.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>, Peter Krautzberger <peter.krautzberger@mathjax.org>
Message-Id: <930EA9A7-A926-4CF2-A652-311E5974F01B@w3.org>
> On 22 Sep 2016, at 07:44, Marcos Caceres <marcos@marcosc.com> wrote:
> 
> <skip/>

>>> 
>>> (Also, it's not even worth talking about SVG being served as an
>>> application: No one does that, so let's not even bother talking about
>>> it. Let's focus on the 99.999% case, which is HTML - SVG is an image
>>> format embedded in HTML.)
>> 
>> Well… we may have to be careful here. An SVG document can be used as the same 'top level'
>> document as HTML in EPUB.
> 
> Yes, you could do the same on the Web - but no one in their right mind
> would do that. That would be crazy.
> 
>> There is a large market for using full screen SVG-s in publishing, unrelated to an HTML
>> content, namely cartoons/mangas. Mangas are huge in Japan (I do not have the exact figures,
>> but afaik, for some Japanese digital book publishers mangas represent the majority
>> of their income), other types of cartoons have a significant market in a number of countries
>> like France or Belgium.
> 
> Sure, but are those linked HTML files or SVG files? I'm going to go
> out on limb and say they are HTML files with embedded SVG images.

As I said: I do not know, we will have to find out.

> 
>> That being said, I do not know whether those books are using SVG as a standalone content,
>> or whether they are embedded in an otherwise empty HTML. Somebody on the list might know.
>> But, at this moment, we should not dismiss SVG to be on par with HTML at least in this area.
> 
> Ok... I've been way wrong before... and know nothing of this space...
> so, sure, proof needed here.
> 
> But the burden of proof is on publishers and we should assume it's
> false until evidence proofs otherwise.
> 
>> (There are, actually, very SVG specific issues that are raised by these applications.
>> But that is for another day…)
> 
> Agree... and hopefully SVG.next will fix some of those issues.

> 
> 
>> Or, I presume, a LINK header in the HTTP response.
> 
> This is currently not supported. We had it in the manifest spec a very
> long time ago, but took it out. Only Firefox has ever supported Link:
> stylesheet, for instance. There are few other specs starting to
> experiment with Link: headers... but Link: hasn't really been a thing
> on the Web (setting headers is hard).
> 
>>  For example we can imagine libraries
>> preferring to set up their alternative manifest for a publication (eg, a different,
>> library specific unique id or other metadata) but not having the right to change the content
>> of the publication. Using a LINK header is a good way of doing so.
> 
> Yep. The use cases are compelling... but, browser support over the
> years, plus the challenge of serving HTTP headers makes this... not so
> appealing right now.
> 

That is something that we may have to discuss, eventually.


> 
>>> 2. A WP optionally includes metadata that users would want to find
>>> these things on... this set would be extremely limited at first and
>>> there would need to be precedence for this, so maybe only author and
>>> category would make the cut! Though category is dubious because it
>>> doesn't internationalize well (so it's pretty garbage). I'm still
>>> somewhat skeptical if "id" would make the cut (e.g., {type: "ISBN",
>>> id: "..."}), as ISBN, etc. can be included into the actual HTML of the
>>> publication.
>> 
>> Because the publication is not one HTML but, potentially, many, I think such an identifier
>> should be in the manifest.
> 
> I'm still not sure who benefits from the identifier in the manifest
> (the manifest content should only benefit the end-user through the
> user agent)? Why can't it just be in the HTML? Search engines
> (including Google Scholar, etc.) know how to find these things
> already.
> 
> Put differently, can anyone show:
> 
> * how I get to the ISBN of a book today downloaded in an eBook reader today?
> * how an end-user would then use this identifier from within an eBook reader?

The ISBN (or equivalent) is in what EPUB calls the package file which, in our parlance, is the manifest. It is one of the required metadata per EPUB.

The term 'end-user' is a bit vague. A reader like you and may not be interested by this, just as we would not look at the ISBN of a paper book. But the various reader software, catalogues, etc, that offer a 'bookshelf' like interface to the users may choose to display more information about a book, including its ISBN. Amazon is not a good example, because it uses proprietary format for ebooks, but similar side rely on the metadata in the EPUB manifest.

> 
> (these are honest question, I don't know... I've only used iBooks and
> an old Kindle)
> 
> If the answer to the above is no (or the answer is, "they copy/paste
> it from one of the pages"), then identifier is kinda useless in the
> manifest as it doesn't need to be surfaced by the user agent.

It is definitely not copy/paste. It is part of the required metadata.

> 
>> We have to be careful what we mean by 'limited' metadata. I agree that adding lots of metadata
>> into the manifest file would be a mistake (there is a limited set of metadata, mostly derived
>> from Dublin Core, as part of the EPUB 'package' definition, too, we should look at that).
>> However, the publication world lots of metadata, related to many different things (provenance,
>> marketing facts, copyright, you name it). Some of these metadata specifications (like
>> ONIX) are huge and, unfortunately, if we take into account the metadata used by trade
>> publishers, libraries, scholarly publishers, magazines, etc, then the "one standard
>> is good, more is better" approach seems to prevail:-)
> 
> We don't need to worry about those... we only need to worry about what
> benefits end-users (and whatever can live in the publication as HTML
> just lives in the HTML... like, say, copyright).

See above. We should not only consider the readers as end users, but also the various systems that "consume" the books.


> 
>> But the important point is: metadata
>> handling, definition, usage, etc, is a hugely important aspect of the business. (As
>> an example, when you look at a page on a book on a site like Amazon, all the data you see there
>> comes, afaik, from the metadata that is provided by the publishers of those books. The
>> distributors, I presume, rarely do that by themselves, and surely not manually.)
> 
> Sure, but it's not of relevance to end users. I'll again echo Mike
> Smith to keep this end-user focused and focused on the browser needs
> to process and work with to provide a greats user experience (again,
> look at iBooks or the Kindle, for instance... it doesn't display any
> such metadata - just provide a great reading experience).
> 
> And for those wanting to surface metadata for, say, a specialized
> community, they can just do that using
> fetch("metadata.xml").then(displayItUsingHTML).

Yes, that is essentially what I said, except that the name 'metadata.xml' should not be fixed name at a fixed position, but a file that I have a reference for. And there may be several metadata files in different formats.

> 
>> What this means is that there should be a slot (and I think that _is_ very publication specific,
>> I do not expect that to make all that much sense for manifest in general) in the manifest
>> that would refer to an external file (or probably files) containing the detailed metadata.
>> The manifest would be silent as for the format of those files (XML, JSON, specific formats
>> like BibTex, Turtle,…); that should be really the job of specialized consumers.
> 
> If it's application/publication specific - it could just be in an
> external file... no need for it to be in the manifest: the manifest is
> only concerned with things the browser can understand and work with.
> If the browser can't process it, it should not be there.

That is exactly what I said. It should not be in the manifest; but there should be a pointer to the metadata.


> 
>> B.t.w., it is conceivable that some of these metadata would be embedded into a content
>> HTML file (eg, adding a JSON-LD content into a  <script> tag), but they may be way too
>> large to make this practical.
> 
> That's probably a pretty strong indicator that a lot of that metadata
> might not be of value to end-users, and thus should not be shipped
> with a publication.
> 
>> Just for my understanding, though: would that mean that the WP's
>> manifest would carry effective Javascript code,
> 
> No. It's just json.

O.k.

> 
>>  or that it would
>> contain information that a generic code in the browser would
>> use? Kenneth referred to the possibility that the manifest would
>> list all the resources in the WP that a service worker should 'check
>> in' at startup; that seems to refer to the latter.
> 
> Absolutely no. The manifest is always as simple as humanly possible
> and would never contain any such listing.
> 
> Further, the Service Worker is specifically designed to NEVER do
> anything on its own: it's a simple event catcher. The developer would
> list resources inside the service worker's script - never in the
> manifest.
> 
> Rule of thumb: the browser or SW will never do any work that the
> developer can do on their own.  If you ever think "the browser
> could..." or "the service worker could"... then just stop... and
> rephrase it as, "a developer would".
> 
> If it's impossible for the developer to do something, for privacy or
> security reasons, then we can talk about "the SW or Browser could..."
> - but never otherwise.

Term usage again;-( I absolutely think of a layer on top of SW would or could. SW is the basic infrastructure.

But the way I understood yesterday is that the list of files that a developer has to check in for SW is (may be) part of a manifest with some specific key. (That is what I heard…)

Cheers

Ivan

> 
>>> I'd love to see other short
>>> rough sketches of what people are thinking…
>> 
>> Dave had an experimental setup with a few books (obviously, Moby
>> Dick among them; that is be the 'hello world' of the digital book
>> world:-). This was based on a small SW implementation that Jake
>> Archibald did last year after a discussion at last year's TPAC,
>> but has to be refreshed. I think he and Kenneth agreed to look into
>> this. It would make things more tangible…
> 
> Dave, please put it up on GH :)  What are you waiting for!?!!?!11one!!
> 


----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Thursday, 22 September 2016 07:54:31 UTC