- From: Bill McCoy <whmccoy@gmail.com>
- Date: Wed, 23 Jan 2013 13:25:07 -0800
- To: public-html@w3.org
- Message-ID: <CAJ0DDbD4qvibO=8sn4V9Rp01o=fkiPdNfUuuYixiwOZeEJghFA@mail.gmail.com>
Regarding EPUB: one issue not mentioned in the thread that was a primary consideration in EPUB 3.0 requiring the XML serialization for HTML (XHTML) was backwards compatibility with the widely adopted previous version of EPUB that was based on XHTML1.1. Allowing "tag soup" HTML would have eliminated the possibility to create EPUB 3 files that can "fall back" i.e. be gracefully handled (minus new capabilities of course) on EPUB 2 Reading Systems. Whether it was a good idea to base EPUB on XHTML 1.1 back in the Dark Ages of Year 2000 is kind of a moot point since so much has changed since then including the HTML roadmap which at that time was quite XHTML-centtric. I will also agree with the statements that EPUB content creation is not best accomplished by raw human authoring of source markup but rather with some assistance from tooling. Most publications/documents are created with authoring tools and other prevalent formats range from rather baroque (.docx) to downright opaque (.pdf). It was a design goal for EPUB to be as simple as possible, but interoperability of tool-based workflows definitely trumped hand-coding-friendliness. Certain bits of EPUB 3 markup (e.g. canonical fragment identifiers) are clearly not intended for humans to author. But there is no fundamental requirement that EPUB version x+1 content be compatible with reading systems for EPUB version x, and if W3C continues to move farther away from XML-based encodings that should IMO be taken into consideration in the development of future versions of EPUB. It is a goal of IDPF to increase the alignment of EPUB with other W3C specifications & I see EPUB as simply the publication (portable document) packaging of the Open Web Platform. And it's true that supporting "tag soup" HTML in EPUB would have some benefits especially when the same content is used by both websites and publications. That said I do think there are benefits to EPUB having only one serialization for content, which is well formed and validatable: the algorithm for "tag soup" conversion may now be well defined in HTML5 but are not necessarily going to be valid against any schema. And using a serialization (XML) that's widely supported with tools built in to essentially every SW development environment and runtime platform in existence makes things simpler for those developing tools and conversion workflows. I'm not aware of every implementation of HTML to XHTML conversion but the C-based XHTML2XHTML library contains dozens of modules comprising over 600KB of source code. That's a pretty hefty add-on to any workflow, and it's not clear whether there exist versions of this conversion for every development environment nor what is their level of quality and robustnesss. Whereas XML parsing comes for free on every platform. And, EPUB publications - like websites - are not solely made up of HTML content. SVG and MathML are first-class citizens as well for example, and AFAIK they are defined as XML-based markup languages, lacking an algorithm like HTML5 for processing "tag soup" variants. Is W3C is going to move away from XML altogether and define "tag soup" parsing for every specificaiton that's part of the Open Web Platform? If not then it seems that HTML more than EPUB could be considered the special case, and that being due to HTML's own backwards-compatibility reality. I'm not suggesting we go back to the days when we tried to ignore this reality and pursued the quixotic goal of pushing everything to XHTML. But EPUB is about structured, packaged content - data that's generated, consumed and manipulated by a variety of tools, not only rendered in a browser, and doesn't have the same backwards compatibility issues as web pages but in fact the opposite due to EPUB's XML beginnings. So I'm not sure I personally see a strong reason that this should change. But again I think this ultimately will likely depend more on what W3C does around XML in general, rather than anything else. --Bill
Received on Thursday, 24 January 2013 22:21:06 UTC