W3C home > Mailing lists > Public > public-xml-core-wg@w3.org > November 2009

Re: XHTML character entity support

From: Shelley Powers <shelley.just@gmail.com>
Date: Tue, 03 Nov 2009 21:14:25 +0000
Message-ID: <643cc0270911031313q5c5c3e6ehdd2b59278701ceae@mail.gmail.com>
To: Boris Zbarsky <bzbarsky@mit.edu>
Cc: Henri Sivonen <hsivonen@iki.fi>, Simon Pieters <simonp@opera.com>, Geoffrey Sneddon <gsneddon@opera.com>, John Cowan <cowan@ccil.org>, "public-xml-core-wg@w3.org" <public-xml-core-wg@w3.org>, "public-html@w3.org" <public-html@w3.org>
On Tue, Nov 3, 2009 at 2:04 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote:
> On 11/3/09 2:39 PM, Shelley Powers wrote:
>>
>> According to the XML specification, XML processors are not guaranteed
>> to process the external DTD subset referenced in the DOCTYPE. This
>> means, for example, that using entity references for characters in
>> XHTML documents is unsafe if they are defined in an external file
>> (except for&lt;,&gt;,&amp;,&quot; and&apos;)
>>
>> That is all a web author needs to know.
>
> That's fine with me.  It's not all an XHTML processor needs to know to be
> compatible with XHTML-as-she-is-spoke.

But it handles the situation currently under discussion: external entities.

>
>>> In the area of "html" (whatever that might mean), this does in fact seem
>>> like the job of this working group, fundamentally.
>>
>> No, the charter for this group is to provide a DOM, an evolution of
>> HTML4, an XML serialization, some APIs, and some wizzy gee wiz
>> graphical "stuff".
>
> If we're going to charter-lawyer, the charter says (in the Scope section):
>
>  This group will maintain and produce incremental revisions
>  to the HTML specification, which includes the series of
>  specifications previously published as XHTML version 1.
>

And since we're no longer following SGML, and technically no longer
support the HTML4 DTD, we shouldn't even allow the named entities.
What Ian has done is hard coded them into the HTML. And since the HTML
serialization is now a lot unto itself, following no model, we're not
in validation of anything -- nothing to validate against.

What we're talking about now, is the XHTML serialization. That has
always had doctypes, which have defined the external entities. It's
only with the XHTML+RDFa doctype, or unknown doctypes, that we run
into inconsistency issues, and that's only because 3 browsers throw an
error, the other just spits out the original entity text.

Hard coding information about how UAs are supposed to respond to
entities for a doctype not in the blessed list presupposes several
things: first that the UA is non-validating (and that's not always
true). Second that the UAs will all want to emulate browser behavior
(tell that to the ePub folks). Lastly, that this is an issue that
should be resolved in HTML, rather than in the XML core.


> Of course there's nothing in the Deliverables section directly addressing
> this part of the scope...
>

Of course not, because browser are not the only HTML UAs.

> So we're not _required_ to define this, but defining it is certainly within
> our scope, if my charter-lawyering is not off.
>

No, disagree: not in our scope. We're not browser nannies.


>> I think it's a mistake to include the areas already included in the
>> document whose sole purpose seems to be to normalize browser behavior.
>
> And I think those are the most important parts of the document and a higher
> priority than pretty much anything else this group is doing. Probably
> something to do with us having slightly different backgrounds here.  ;)
>

And that's the only whole purpose of having people from different
interest groups involved in these specifications: HTML5 has to meet
the needs of a diverse audience.

>> For certain doctypes, the browsers support the entities via catalog.
>> This is consistent with validating parsers.
>
> Yes.
>
> My concern is that if someone decides to write a new browser tomorrow they
> should be able to do so by reading the spec and implementing, without having
> to reverse-engineer existing browsers.  I realize you don't care about this,
> presumably because you're happy with the browser competitive landscape, past
> and present.   I'm not happy with the past, and I'm only marginally more
> happy with the present.  I do want to make sure we do NOT ever go back to
> the competitive landscape of the past for browsers.  That involves it being
> as easy as possible to create a browser.
>

I don't see it as a concern. If someone really wants to create a whole
new browser, they should be able to read all the specifications and
determine what they need to do. They shouldn't have to follow a
codified path, if there's another that works as well, and is still
compliant with the relevant standards.

We should minimize codifying extraneous non-standards based behavior
as much as possible. How external entities are defined, and used, is
the under the care of XML core, not HTML5.



>> For XHTML5, which has no DTD, the behavior is consistent: only the
>> five predefined entities are available, anything else is an error. And
>> from I can see, the behavior with this is consistent.
>
> Sure.  I'm not worrying about the no-DTD case here.
>
> -Boris
>

I think there is some confusion about this, in this thread, though I
think we're all on the same page now. I know it took some time to
figure out the different viewpoints involved. But then, I'm not a
browser developer so I read things from a different perspective.

Of course, we may not be on the same thread as the original author, Alexey, now.

Shelley
Received on Wednesday, 4 November 2009 17:05:30 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:40:40 UTC