Re: The non-polyglot elephant in the room from Michael[tm] Smith on 2013-01-22 (public-html@w3.org from January 2013)

From: Michael[tm] Smith <mike@w3.org>
Date: Tue, 22 Jan 2013 14:18:22 +0900
To: Henri Sivonen <hsivonen@iki.fi>
Cc: public-html WG <public-html@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <20130122051820.GL46651@sideshowbarker>
Henri Sivonen <hsivonen@iki.fi>, 2013-01-21 16:48 +0200:

> On Mon, Jan 21, 2013 at 4:24 PM, Michael[tm] Smith <mike@w3.org> wrote:
> > Anyway, about EPUB in particular, it's worth noting that there's nothing
> > inherent in the technology of EPUB that necessitates the use of well-formed
> > XML/XHTML in it rather than not-necessarily-well-formed text/html.
> 
> Nothing inherent, sure, but in practice, backwards-compatibility with
> existing Reading Systems is a pretty big deal,

Really?

To be clear, I was referring to the current EPUB spec, EPUB3. And as far as
I understand it at least, EPUB3 documents are not backward-compatible with
prior EPUB2 Reading Systems. And I think EPUB2 documents are not
necessarily forward-compatible with EPUB3 systems, because EPUB2 documents
may have some markup features that are not supported in existing browsers.

I mean, EPUB3 systems are mostly just UI wrappers around an existing
browser engine, right?  (or in practice, as you know, specifically around
WebKit). And of course existing browser engines already have HTML parsers.

I don't really know the details of EPUB2 Reading System implementations but
I guess its possible that some or most may not even use existing browser
engines at all.

> and you’d lose that for very little gain by allowing text/html in EPUB.

It's not clear to me that you'd actually lose anything at all, since as I
said I don't think there's actually an expectation for backwards-
compatibility of EPUB3 documents with existing EPUB2 systems. They're
already known to be incompatible in other ways.

> > The reason EPUB requires XHTML is that the EPUB working group made an
> > explicit choice to require it. They could have chosen to allow text/html
> > EPUB books but they chose not to. And I think some of the people who
> > advocated for requiring XHTML didn't understand that existing XML-based
> > toolchains could be made to handle text/html content just by putting an
> > HTML parser in front of them.
> 
> In fairness, when the decision was made for EPUB, text/html parsing
> had not been defined.

What version of EPUB do you mean? When the decision was made for EPUB3, the
HTML (HTML5) spec already included a definition for text/html parsing. I
think in fact at that time you had even already implemented it for the
validator and maybe even landed it in Gecko too.

I know about the timing because I briefly participated in the EPUB3
discussions and suggested that they not make XHTML a requirement. (By the
way, I also don't remember anybody saying then that they needed the XHTML
requirement for backwards-compatibility reasons; they gave other reasons.)

> I think requiring XHTML is the least of EPUB’s problems when it comes
> to author ergonomics.

I think the author ergonomics for EPUB3 are not so bad -- but the user
ergonomics still are. Given that browsers are as capable of rendering
"books" as they are of rendering any other class of document, it's still a
massive usability shortcoming that I as a user/e-book purchaser need to
install a "Reading System" in order to read a book that I've bought. Worse
yet, I don't need to install just one Reading System, I need to install
multiple ones -- because the books that I purchase from a particular vendor
are for non-technical reasons limited to only being viewable in that
vendor's particular Reading System, instead of being portable. Among other
problems that gives vendors very little incentive to compete on the quality
of their Reading Systems.

> The main annoyances are needless indirection (Why do you need to be
> able to locate the OPF

OK so yeah I see you are talking about EPUB2. I think things are much
better for EPUB3 authors.

> wherever you want and have a pointer to it in a
> well-known location? Why aren't you just put the OPF in the well-known
> location?),

> reinventing ways to express many things that HTML can already express
> (stating book title and authorship without XHTML <title> and <meta
> name=author>, declaring the order of XHTML files using <spine> instead of
> <link rel=next> in the files themselves).

I expect those particular discrepancies no longer exist in EPUB3.

> The annoyances mentioned in the previous paragraph make EPUB authoring
> by hand is terrible enough that you need a tool, and once you have a
> tool you might as well throw HTML to XHTML conversion into the tool.

All of which is true for EPUB2 I guess. But my point was that there is no
strong technical reason to perpetuate the XHTML-only requirement further
into EPUB3 -- and (to repeat something for emphasis) that text/html HTML
could be made usable in EPUB3 books simply by having the EPUB3 spec state
that HTML is usable in EPUB3 books.

  --Mike

-- 
Michael[tm] Smith http://people.w3.org/mike
Received on Tuesday, 22 January 2013 05:18:36 UTC