Re: Named character entities from Henri Sivonen on 2003-05-18 (www-html@w3.org from May 2003)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Sun, 18 May 2003 14:37:22 +0300
To: www-html@w3.org
Message-Id: <16D2A72A-8925-11D7-80D6-003065B8CF0E@iki.fi>
On Friday, May 16, 2003, at 20:04 Europe/Helsinki, John Lewis wrote:

> Henri wrote on Friday, May 16, 2003 at 9:41:05 AM:
>
>> Not supporting character entities in XHTML is not a bug in Opera if
>> 1) XHTML is an application of XML
>> and
>> 2) XHTML user agents aren't required to use validating XML
>> processors.
>
> Actually, Opera Software internally decides what's a bug in Opera and
> what's not. Whether or not they technically need to do something is
> irrelevant; although I'm sure it's a consideration, there are
> practical concerns as well.

I guess that depends on the definition of "bug". Your approach would 
make it very easy to publish bug-free software. :-)

The concerns here that may seem practical to document authors are very 
impractical from the implementor point of view.

>> Do you classify TextEdit (bundled with Mac OS X) and WordPad
>> (bundled with Windows XP) as "special advanced authoring tools"?
>
> I've never used TextEdit. I'd hesitate to even call WordPad an
> authoring tool, but perhaps the Windows XP version is different.
>
> People make use of named entities daily, and you've not provided any
> evidence that they're harmful;

I have said this a couple of times already, but I try to make it 
clearer this time.
1) In XML, except for lt, gt, amp, apos and quot, character entities 
have to be
    declared in the formal part of the DTD in order to be referencable 
in the
    places where character data may occur.
2) Document authors would find inconvenient to paste the character 
entity
    declaration in the internal DTD subset of each document, so I've 
implicitly
    assumed that the character entities would have to be declared in the 
external
    DTD subset.
3) The XML spec defines the concepts of well-formedness, standalone 
document
    and non-validating XML processor in order to accommodate the needs 
of interactive
    document browsers and applications that are used in a network 
context.
4) Given 3) it would be harmful to make requirements elsewhere that 
would force Web
    browsers not to use non-validating XML processors. (Also, it would 
be harmful to make
    it impractical to use standalone documents.)
5) Non-validating XML processors are not required to process external 
entities. That is,
    they are not required to process the external DTD subset.
6) Processing external entities of the size of typical W3C DTDs along 
with the document
    entity would incur runtime performance penalties compared to only 
processing
    a tag soup document entity or an XML document entity.
7) Given 6) processing external entities is undesirable in interactive 
applications even
      though non-validating XML processors are allowed to process 
external entities.

Hence, interactive user agents should not be expected to process 
external entities and, therefore, character entities declared in the 
external DTD subset should not be expected to be available.

Wiggling out of point 1) would mean descending into tag soupness by 
violating the rules of the language framework. Wiggling out of point 6) 
would mean either pretending to process a W3C DTD while actually 
processing something else (this is what Mozilla does but doing so is 
dirty and has the danger of compelling others to follow) or caching the 
data structures that get built when the DTD is parsed (which would be 
unduly difficult because the declarations made in the external subset 
may change depending on parameter entities in the internal subset).

> The original
> poster suggested they be made optional, which is something I'm okay
> with since market forces will compel the major UAs to support them
> anyway.

They are optional already in XHTML 1 given point 5) above. I think 
avoiding the related issues of external entity processing by using 
dismissal as "optional" is bad in the Web context, because either some 
browsers don't support character entities (in which case they could 
just as well not exist and a lot of confusion could be avoided) or all 
browsers are forced to support them (which would have ugly implications 
[see above]).

> As a practical matter, the major UAs already support HTML,
> which means they support the named entities already, which means
> there's little reason to not support them for XHTML (unless they're
> removed).

That reasoning is flawed. Character entity support for HTML is 
implemented in tag soup processors--not in XML processors. Tag soup 
processors by their nature don't play by the rules of SGML or XML. 
Entity support in HTML parsed as tag soup and in XHTML parsed as real 
XML have no implementation connection.

-- 
Henri Sivonen
hsivonen@iki.fi
http://www.iki.fi/hsivonen/
Received on Sunday, 18 May 2003 07:37:30 UTC