Re: XHTML character entity support from Alexey Proskuryakov on 2009-10-31 (public-html@w3.org from October 2009)

From: Alexey Proskuryakov <ap@webkit.org>
Date: Sat, 31 Oct 2009 13:24:51 -0700
To: Shelley Powers <shelley.just@gmail.com>
Cc: HTML WG <public-html@w3.org>
Message-id: <56B27BE5-0EFF-4A77-AE42-3C2646675B8A@webkit.org>

31.10.2009, в 11:29, Shelley Powers написал(а):

>> As noted in
>> <http://www.whatwg.org/specs/web-apps/current-work/#writing-xhtml-documents 
>> >,
>> there is no guarantee that authors can use character entity  
>> references such
>> as &nbsp; in XHTML, because XML parsers are not required to process  
>> external
>> DTD subsets. This works in at least Firefox, Safari and Opera, but  
>> it's
>> depressing that such a major feature is not interoperable per the  
>> spec.
>>
>
> Actually it is interoperable -- there is no guarantee that an XML
> parser will process an external DTD. What kind of change do you want
> to make it interoperable?

I do not understand your comment. "No guarantee" is a synonym to "not  
interoperable", isn't it?

Authors have long been using &nbsp; and friends in XHTML, and no  
browser engine can practically ship without support for such (as long  
it supports XHTML at all, of course). To me, this means that this  
requirement needs to be present in HTML5 - a spec that says an engine  
is not required to support these entities would be misleading and  
unhelpful.

>> I think that it's important to guarantee that character entity  
>> references
>> work in XHTML (even when parsing fragments, e.g. with innerHTML -  
>> which
>> doesn't currently work in Firefox or Safari, and is confusing to  
>> authors).
>>
>
> True, named entities don't work with innerHTML with Firefox, Safari,
> and Chrome. But numeric references do work, regardless of DTD. One can
> use &#163; instead of &pound; with consistent results regardless of
> browser, XHTML or HTML, and DTD.

I know that authors get confused by this limitation of innerHTML - and  
I do not think it's necessary from any point of view. It would be  
trivial to fix in WebKit, for instance, and that wouldn't violate any  
spec besides the current draft of HTML5.

> Can we do what you ask and ensure the document will still parse as
> XML, without errors?

I doubt that there is a beautiful way to do so. Since I'm basically  
asking to decouple XHTML named entity support from validation, no  
matter what we do would likely go against the spirit of original XML  
specs. There are non-beautiful solutions - for example, a UA can  
recognize XHTML DTDs by name, and enable named entity support without  
fetching and parsing that DTD.

On the other hand, I'm not sure that compatibility with non-UA XML  
parsers should be maintained at all costs. There is precedent for Web  
content that claims being XML without strictly being such - RSS feeds  
- and the sky hasn't fallen.

- WBR, Alexey Proskuryakov

Received on Saturday, 31 October 2009 20:25:27 UTC