Re: EPUB and XML [was: The non-polyglot elephant in the room] from Henri Sivonen on 2013-01-28 (public-html@w3.org from January 2013)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 28 Jan 2013 13:51:39 +0200
To: "Michael[tm] Smith" <mike@w3.org>
Cc: Bill McCoy <whmccoy@gmail.com>, public-html@w3.org
Message-ID: <CAJQvAucpGaKFLDThevcaR5wsbSvQCpU2eEVBeH=kQjfumi8czA@mail.gmail.com>
(So it appears that we are discussing this on public-html anyway and
so far the Chairs haven't shut this tangent off after all. I
previously sent
http://lists.w3.org/Archives/Public/www-archive/2013Jan/0022.html
to www-archive to adhere to the policies of this list.)

On Sat, Jan 26, 2013 at 2:29 PM, Michael[tm] Smith <mike@w3.org> wrote:
>> But there is no fundamental requirement that EPUB version x+1 content be
>> compatible with reading systems for EPUB version x,

I'd be pretty unhappy if my EPUB Reading System for version x was
rendered obsolete by EPUB x+1 (unless Adobe makes sure all their OEMs
push a new version of Reader Mobile SDK everywhere, which probably is
not going to happen). I think the Degrade Gracefully design principle
should apply to EPUB in addition to applying to the Web. Due to
historical reasons, EPUB is on an XML track and the Web is not. Trying
to unify them at this point would cause pain.

(FWIW, the value for x is still 2 for ADE and Reader Mobile SDK-based
systems. I think you are way too optimistic when it comes to updates
to EPUB Reading Systems. It’s like IE6 all over again. These things
don’t update every 6 weeks.)

>> & I see EPUB as simply the publication (portable document) packaging of
>> the Open Web Platform.

Weren't the W3C Widgets supposed to be that? As far as I can tell,
characterizing EPUB like that is unhelpful scope creep their way
adding videos, 3D and interactivity was for PDF.

The user expectation is that an EPUB represents a work similar to a
printed book except that it's digital and re-flowable. That's why
Reading Systems default to paged media rendering rather than
continuous media rendering, and you'd expect the latter from a system
whose goal is just to be a Web site in a box. (Likewise, the user
expectation is that PDF represents the pages of a print-oriented work
as vector graphics.)

> I think that's another excellent point. If somebody wants to take their
> existing Web content an package it up as and portable document for viewing
> in an EPUB reading system, I think they ideally should not be required to
> transform it into well-formed XML to do that.

Ideally, yes, if we could go back in time and change the nature of
EPUB. However, existing Reading Systems being what they are, I think
it would be the wrong optimization to allow people to put HTML in EPUB
(thereby rendering the resulting EPUB file incompatible with existing
Reading Systems). Instead, I think EPUB should optimize for backwards
compatibility and require authors to run a tool like my HTML2XML.
Running such a tool is a much lesser chore than generating all the
metadata that EPUB requires, making sure that the mimetype file has
been special cased in the zip container the right way and making sure
that funds have been mangled using the IDPF mangling algorithm.

>> That said I do think there are benefits to EPUB having only one
>> serialization for content,
>
> I think we could say the same about there being benefits to having HTML
> itself have only on serialization. But the reality is that it has two, and
> Web user agents that want to consume and process actual Web content
> properly need to have support for content in both serializations.

Web agents need to consume both. At present, EPUB Reading Systems
don’t. And they can be kept walled off enough that they don’t need to
start supportting another serialization.

> But as far as I understand it the case with current EPUB reading systems is
> that they're all using browser engines to parse and render EPUB HTML
> content, and those browser engines all already have HTML parsers.

As far as I can tell, this is not true for the EPUB engine supplied as
part of the Adobe Reader Mobile SDK which is currently either the most
important our second most important EPUB engine (depending on how you
judge importance relative to Apple’s engine). (Assuming you don’t
count KindleGen + Kindle as a split architecture EPUB Reading System.)

And Reader Mobile SDK is current in the sense that it ships on devices
introduced late last year. (It’s not current in the sense that it
doesn’t appear to support EPUB3.)

> I think there are actually some very strong reasons that EPUB reading
> systems and processing tools should change to being able to handle
> text/html content, as I hope I've made clear in my comments above.

Do you expect all the vendors who’ve shipped an embedded system with
Reader Mobile SDK on it over the last 5 years or so to get a new
engine from Adobe and push it to the customers whose money has already
been taken? I think it would be pretty uncool to obsolete devices
people have bought just to save authors the trouble of running an HTML
to XHTML converter, when they need to jump through more difficult
hoops anyway to put together an EPUB file.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Monday, 28 January 2013 11:52:10 UTC