Re: XHTML character entity support from Henri Sivonen on 2009-11-04 (public-xml-core-wg@w3.org from November 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 04 Nov 2009 07:58:47 +0000
To: "public-html@w3.org WG" <public-html@w3.org>
Cc: John Cowan <cowan@ccil.org>, Boris Zbarsky <bzbarsky@MIT.EDU>, Shelley Powers <shelley.just@gmail.com>, Simon Pieters <simonp@opera.com>, Geoffrey Sneddon <gsneddon@opera.com>, public-xml-core-wg@w3.org
Message-Id: <8398D873-80AA-488E-9FFD-87C80D9719F2@iki.fi>

On Nov 3, 2009, at 21:44, John Cowan wrote:

> But that does not have to be so, if the HTML5 group decides otherwise.
> A DTD could be provided, and if it had a standard public or system
> identifier, standard XML catalog software could be used to cache the
> DTD
> (or indeed certain DTDs could be hardwired).  This would be a
> particular
> in XHTML software, not something general to all XML parsers.

Minting a new public or system id would be against the Degrade
Gracefully Design Principle[1], since it would lead to a fatal XML
parse error in shipped Gecko and in shipped WebKit.

The backwards-compatible way of using the XHTML 1.0 entity set in
application/xhtml+xml is using the doctype <!DOCTYPE html PUBLIC "-//
W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd ">.

One can use <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML
2.0//EN" "http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd"> and
trade compat with WebKit and Opera into the ability to use the MathML
entities in shipped Gecko. (Here's a point where interop between
browsers is lacking, BTW.)

Since XHTML5 would violate layering by if it didn't permit any and all
doctypes that XML permits, these doctypes are already permitted in
conforming XHTML5 documents. Thus, defining entity resolver behavior
for these legacy doctypes does have relevance to XHTML5.

[1] http://www.w3.org/TR/html-design-principles/#degrade-gracefully

On Nov 3, 2009, at 20:08, Shelley Powers wrote:

> On Tue, Nov 3, 2009 at 10:34 AM, Henri Sivonen <hsivonen@iki.fi>
> wrote:
>> Not in my opinion. If predictably uniform behavior between UAs is
>> wanted and
>> if we want to make it non-mysterious for implementors how to
>> performantly
>> parse application/xhtml+xml content written for browsers, this WG
>> should
>> specify normative entity resolver behavior (i.e. mappings from
>> public id and
>> system id pairs onto streams).
>
> First, application/xhtml+xml is not written for browsers. Browsers may
> be the biggest implementors, but they're not the only implementor.

I didn't mean to suggest that all application/xhtml+xml is written for
browsers. I meant to qualify my statement to be only about the subset
of application/xhtml+xml content that is written for browsers.

> Second, what you're discussing seems to be something that's general to
> all XML parsers, not just browsers.

Not all XML parsers--only applications that want to consume
application/xhtml+xml content in a browser-compatible way.
Validator.nu would fall into this class.

> The job of this working group is not to normalize all of the browser
> quirks and differences. Don't you agree?

Maybe not all but a sizable number thereof.

>> As a practical matter, if I'm using SAX in Java, I can't get a
>> browser-style
>> EntityResolver off-the-shelf as part of a common org.apache
>> package. (Or
>> maybe I could but I'm unaware.)
>>
>
> What exactly would you expect the EntityResolver to do with XHTML that
> it wouldn't do with other flavors of XML?

I'd expect it to map the public ids listed at
http://mxr.mozilla.org/mozilla-central/source/parser/htmlparser/src/nsExpatDriver.cpp#287
to a bogo-DTD that defines either the XHTML 1.0 entities or the
*latest* MathML entity set (depending on which one of the two DTD
files in named in nsExpatDriver.cpp), and I'd expect it to map other
public ids and lone system ids to the empty stream.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Wednesday, 4 November 2009 17:05:37 UTC