Re: The non-polyglot elephant in the room from Henri Sivonen on 2013-01-21 (public-html@w3.org from January 2013)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 21 Jan 2013 16:48:17 +0200
To: "Michael[tm] Smith" <mike@w3.org>
Cc: Leonard Rosenthol <lrosenth@adobe.com>, Anne van Kesteren <annevk@annevk.nl>, public-html WG <public-html@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <CAJQvAufWq9GFeDvohJFH56FuhXrTXMkcDFpm+8GHfFDCmawkPQ@mail.gmail.com>

On Mon, Jan 21, 2013 at 4:24 PM, Michael[tm] Smith <mike@w3.org> wrote:
> Anyway, about EPUB in particular, it's worth noting that there's nothing
> inherent in the technology of EPUB that necessitates the use of well-formed
> XML/XHTML in it rather than not-necessarily-well-formed text/html.

Nothing inherent, sure, but in practice, backwards-compatibility with
existing Reading Systems is a pretty big deal, and you’d lose that for
very little gain by allowing text/html in EPUB.

> The reason EPUB requires XHTML is that the EPUB working group made an
> explicit choice to require it. They could have chosen to allow text/html
> EPUB books but they chose not to. And I think some of the people who
> advocated for requiring XHTML didn't understand that existing XML-based
> toolchains could be made to handle text/html content just by putting an
> HTML parser in front of them.

In fairness, when the decision was made for EPUB, text/html parsing
had not been defined.

> So HTML could be make usable in EPUB books simply by having the EPUB spec
> state that HTML is usable in EPUB books -- instead of having it impose a
> technically unnecessary requirement that they must be XHTML.

I think requiring XHTML is the least of EPUB’s problems when it comes
to author ergonomics.

The main annoyances are needless indirection (Why do you need to be
able to locate the OPF wherever you want and have a pointer to it in a
well-known location? Why aren't you just put the OPF in the well-known
location?), the dependency on an XML vocabulary even worse than OPML
(NCX), the requirement to declare various things that the reading
system could easily inspect itself and cache for later use (e.g.
whether a given file uses scripting or has MathML) and reinventing
ways to express many things that HTML can already express (stating
book title and authorship without XHTML <title> and <meta
name=author>, declaring the order of XHTML files using <spine> instead
of <link rel=next> in the files themselves).

The annoyances mentioned in the previous paragraph make EPUB authoring
by hand is terrible enough that you need a tool, and once you have a
tool you might as well throw HTML to XHTML conversion into the tool.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 21 January 2013 14:48:47 UTC