Re: XHTML2 MIME type from Henri Sivonen on 2003-05-11 (www-html@w3.org from May 2003)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Sun, 11 May 2003 20:25:36 +0300
To: www-html@w3.org
Message-Id: <943F66D4-83D5-11D7-80D6-003065B8CF0E@iki.fi>

On Sunday, May 11, 2003, at 06:27 Europe/Helsinki, Jelks Cabaniss wrote:

> I don't think an XHTML-aware UA should be downloading the XHTML 
> entities
> when it's dealing with the fixed vocabulary it already knows about!

An application (in the software program sense) using an XML processor 
to parse a document marked up in an XML-based language needs to know 
about the elements and attributes but the entity names should be of no 
concern to the application. The character entities should be dealt with 
at the XML processor level and a conformant XML processor only knows 
about lt, gt, quot, apos and amp a priori.

If external entities are to be processed, the alternative to 
downloading is maintaining a local DTD catalog for a finite set of 
public ids. This approach breaks when someone comes up with a homegrown 
combination of the XHTML modules and uses a public id that isn't 
well-known.

Mozilla a DTD catalog, but it cheats: The DTDs in Mozilla's catalog are 
so heavily abridged that they only contain the character entity 
declarations. The character entities work as if Mozilla was reading the 
real DTD but other stuff like attribute defaulting and the #IDness of 
attributes don't work.

The DTDs are *huge* compared to the content length of usual Web pages. 
Parsing an entire DTD every time a document is loaded makes no sense in 
an interactive application. The concepts of XML well-formedness and 
non-validating parsers were supposed to relieve interactive 
applications from dealing with external DTD subsets.

-- 
Henri Sivonen
hsivonen@iki.fi
http://www.iki.fi/hsivonen/

Received on Sunday, 11 May 2003 13:25:45 UTC