Re: XHTML character entity support from Shelley Powers on 2009-10-31 (public-html@w3.org from October 2009)

From: Shelley Powers <shelley.just@gmail.com>
Date: Sat, 31 Oct 2009 16:02:52 -0500
To: Alexey Proskuryakov <ap@webkit.org>
Cc: HTML WG <public-html@w3.org>
Message-ID: <643cc0270910311402x77625fbeh7bfa00ef46b5f866@mail.gmail.com>
On Sat, Oct 31, 2009 at 3:24 PM, Alexey Proskuryakov <ap@webkit.org> wrote:
>
> 31.10.2009, в 11:29, Shelley Powers написал(а):
>
>>> As noted in
>>>
>>> <http://www.whatwg.org/specs/web-apps/current-work/#writing-xhtml-documents>,
>>> there is no guarantee that authors can use character entity references
>>> such
>>> as &nbsp; in XHTML, because XML parsers are not required to process
>>> external
>>> DTD subsets. This works in at least Firefox, Safari and Opera, but it's
>>> depressing that such a major feature is not interoperable per the spec.
>>>
>>
>> Actually it is interoperable -- there is no guarantee that an XML
>> parser will process an external DTD. What kind of change do you want
>> to make it interoperable?
>
> I do not understand your comment. "No guarantee" is a synonym to "not
> interoperable", isn't it?

Interoperable means, to me, that when parsers do support external
DTDs, they do so consistently, and according to established rules.
However, rather than put what could be an onerous burden on the
parsers, there's nothing that states parsers have to parse external
DTDs.

That is consistent -- a parser not supporting an external DTD is a
possibility, it's not a surprise, nor something unexpected.

>
> Authors have long been using &nbsp; and friends in XHTML, and no browser
> engine can practically ship without support for such (as long it supports
> XHTML at all, of course). To me, this means that this requirement needs to
> be present in HTML5 - a spec that says an engine is not required to support
> these entities would be misleading and unhelpful.
>

The named entities are in HTML5, but in the HTML serialization (which
I guess has its own challenges[1]). The XHTML serialization follows
the requirements of its XML format.

This state of affairs is not unusual, and in fact, how the web has
been working for several years now. I would think to do anything
different would introduce an inconsistency between past
implementations and the present.

If we didn't have an alternative, then yes, I agree, we have a
problem. But we have numeric references which work well, consistently,
and across serializations and aren't dependent on DTDs.

>>> I think that it's important to guarantee that character entity references
>>> work in XHTML (even when parsing fragments, e.g. with innerHTML - which
>>> doesn't currently work in Firefox or Safari, and is confusing to
>>> authors).
>>>
>>
>> True, named entities don't work with innerHTML with Firefox, Safari,
>> and Chrome. But numeric references do work, regardless of DTD. One can
>> use &#163; instead of &pound; with consistent results regardless of
>> browser, XHTML or HTML, and DTD.
>
> I know that authors get confused by this limitation of innerHTML - and I do
> not think it's necessary from any point of view. It would be trivial to fix
> in WebKit, for instance, and that wouldn't violate any spec besides the
> current draft of HTML5.
>

True, if WebKit parses external DTDs, and the DTD used provides the
named entities, it would be more consistent for Webkit to support
named entities in page fragments (innerHTML) as it does for the
document. But I would assume, then, that it would do so by parsing the
external DTDs, not hard coding the named entities?


>> Can we do what you ask and ensure the document will still parse as
>> XML, without errors?
>
> I doubt that there is a beautiful way to do so.

I never reference "beauty" as regards technology, only consistency. If
XHTML5 is XML-based, then it must support XML rules.

Since I'm basically asking
> to decouple XHTML named entity support from validation, no matter what we do
> would likely go against the spirit of original XML specs. There are
> non-beautiful solutions - for example, a UA can recognize XHTML DTDs by
> name, and enable named entity support without fetching and parsing that DTD.
>

True, but then what happens if a person uses an internal DTD? Or
provides a custom DTD? Wouldn't you have inconsistent results, and
introduce an inconsistency that would be extremely difficult for them
to understand? From my own playing around, the custom named entities
are supported in the document, but not the page fragment. You're
saying make an exception for a set of named entities from an existing
DTD -- hard code the references. Without knowing what's happened, the
web developer won't know why these named entities work with a page
fragment, while her own, properly provided, entities won't.

Right now, we do have a consistent implementation. What you're asking
for is to implement an inconsistent implementation for XHTML. At
least, that's my understanding.

> On the other hand, I'm not sure that compatibility with non-UA XML parsers
> should be maintained at all costs. There is precedent for Web content that
> claims being XML without strictly being such - RSS feeds - and the sky
> hasn't fallen.
>

I believe that all RSS 2.0 and Atom 1.0 feeds must conform to the XML
1.0 specification. I'm not aware of feeds that are less conforming.
I'd be surprised if aggregators wouldn't have problems with such. I'd
have to defer to Sam Ruby on this one, he's the most expert person on
feeds I know of.

I don't think named entities is enough justification to loosen our
standards, not when there are other options, and not when people have
been dealing with this issue, successfully, for many years now.

Again, though, perhaps others have a different viewpoint, or a way to
provide what you need without loosening our standards.

> - WBR, Alexey Proskuryakov
>
>

Shelley
Received on Saturday, 31 October 2009 21:03:26 UTC