Re: The non-polyglot elephant in the room

(www-archive so that the HTML WG Chairs don’t need to remind us to
stay on topic.)

On Tue, Jan 22, 2013 at 7:18 AM, Michael[tm] Smith <mike@w3.org> wrote:
> Henri Sivonen <hsivonen@iki.fi>, 2013-01-21 16:48 +0200:
>
>> On Mon, Jan 21, 2013 at 4:24 PM, Michael[tm] Smith <mike@w3.org> wrote:
>> > Anyway, about EPUB in particular, it's worth noting that there's nothing
>> > inherent in the technology of EPUB that necessitates the use of well-formed
>> > XML/XHTML in it rather than not-necessarily-well-formed text/html.
>>
>> Nothing inherent, sure, but in practice, backwards-compatibility with
>> existing Reading Systems is a pretty big deal,
>
> Really?

Yes. Adobe’s Reader Mobile SDK enforces well-formedness and is
emdebbed in pretty much every non-Amazon E Ink-based e-reading device.
Also, those devices are within their useful life, but it’s virtually
certain that many of them will never see another software update.

> To be clear, I was referring to the current EPUB spec, EPUB3. And as far as
> I understand it at least, EPUB3 documents are not backward-compatible with
> prior EPUB2 Reading Systems.

EPUB3 books that use fixed layout or scripting get their layout messed
up in their scripting rendered non-operational in EPUB2-only Reading
Systems.

However, if you have an actual book-like thing (no video, audio or
scripting) that is a pure-EPUB3 file, you can read the book on an
EPUB2 Reading System with the following caveats:
 1) fixed layout gets messed up (but you can still read the text)
 2) MathML gets messed up
 3) the table of contents appears empty.

And in practice, today as well as in the future, there will be plenty
of publications that don't use math or fixed layout. Also, I expect
that EPUB3 publications will include an NCX table of contents for
compatibility with legacy Reading Systems for a long time, since
generating the NCX can be automated and publishing workflows written
for EPUB2 already have that automation.

So even if one of the selling points of EPUB3 is that you no longer
need to deal with NCX, in practice you will deal with it anyway for
compatibility.

> And I think EPUB2 documents are not
> necessarily forward-compatible with EPUB3 systems, because EPUB2 documents
> may have some markup features that are not supported in existing browsers.

In practice, EPUB2 books out there use XHTML—not DTB, so if you meant
DTB, the issue is only theoretical.

> I mean, EPUB3 systems are mostly just UI wrappers around an existing
> browser engine, right?

If you read the spec carefully, it's clear that it's been designed in
such a way that EPUB3 reading systems can decide on a per-XHTML file
basic whether to render the file in a paginating-capable non-browser
engine-based renderer originally developed for EPUB2 or in a
hastily-ported bolt-on copy of WebKit that doesn't even support
pagination. (Of course, serious WebKit-based Reading Systems already
support pagination in WebKit.)

> It's not clear to me that you'd actually lose anything at all, since as I
> said I don't think there's actually an expectation for backwards-
> compatibility of EPUB3 documents with existing EPUB2 systems. They're
> already known to be incompatible in other ways.

There's a huge difference between losing the table of contents and
being unable to read the book content at all.

>> > The reason EPUB requires XHTML is that the EPUB working group made an
>> > explicit choice to require it. They could have chosen to allow text/html
>> > EPUB books but they chose not to. And I think some of the people who
>> > advocated for requiring XHTML didn't understand that existing XML-based
>> > toolchains could be made to handle text/html content just by putting an
>> > HTML parser in front of them.
>>
>> In fairness, when the decision was made for EPUB, text/html parsing
>> had not been defined.
>
> What version of EPUB do you mean?

I mean EPUB in general, because you don’t get to throw away
compatibility when you increment the spec version number.

> When the decision was made for EPUB3, the
> HTML (HTML5) spec already included a definition for text/html parsing. I
> think in fact at that time you had even already implemented it for the
> validator and maybe even landed it in Gecko too.

Sure, but they have legacy, too. Reader Mobile SDK is to EPUB what IE
on XP is to the Web.

> Among other
> problems that gives vendors very little incentive to compete on the quality
> of their Reading Systems.

Yet, Reader Mobile SDK, which is sold by DRM entanglements—not by
implementation quality, is better at intra-paragraph typography than
e.g. Kobo’s own WebKit-based EPUB engine. And Kobo ships both. So you
get better typography for books bought from the Kobo store by
exporting them as Adobe DRM and loading them onto your Kobo device via
ADE so that Adobe DRM forces the device to use the Adobe engine
instead of the Kobo engine!

>> The main annoyances are needless indirection (Why do you need to be
>> able to locate the OPF
>
> OK so yeah I see you are talking about EPUB2. I think things are much
> better for EPUB3 authors.

No they aren’t really. The only improvement is the lack of NCX. Except
you still need NCX for compat with old Reading Systems. So now you
need NCX and then something else, too!

>> wherever you want and have a pointer to it in a
>> well-known location? Why aren't you just put the OPF in the well-known
>> location?),
>
>> reinventing ways to express many things that HTML can already express
>> (stating book title and authorship without XHTML <title> and <meta
>> name=author>, declaring the order of XHTML files using <spine> instead of
>> <link rel=next> in the files themselves).
>
> I expect those particular discrepancies no longer exist in EPUB3.

They do.

>> The annoyances mentioned in the previous paragraph make EPUB authoring
>> by hand is terrible enough that you need a tool, and once you have a
>> tool you might as well throw HTML to XHTML conversion into the tool.
>
> All of which is true for EPUB2 I guess. But my point was that there is no
> strong technical reason to perpetuate the XHTML-only requirement further
> into EPUB3 -- and (to repeat something for emphasis) that text/html HTML
> could be made usable in EPUB3 books simply by having the EPUB3 spec state
> that HTML is usable in EPUB3 books.

Considering what I said in this email and in the one you replied to, I
think it made sense to limit EPUB3 to XHTML-only, too.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Tuesday, 22 January 2013 07:53:15 UTC