Re: [author-guide] Character Entity References Chart from Henri Sivonen on 2008-07-22 (public-html@w3.org from July 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 22 Jul 2008 13:49:36 +0300
To: Jirka Kosek <jirka@kosek.cz>
Cc: Karl Dubost <karl@w3.org>, Lachlan Hunt <lachlan.hunt@lachy.id.au>, public-html WG <public-html@w3.org>
Message-Id: <DB5FF2AD-7095-46AB-A6C2-2F8D780FC2C5@iki.fi>

On Jul 22, 2008, at 12:10, Jirka Kosek wrote:

> Henri Sivonen wrote:
>
>> How do catalogs address:
>> * older browsers performing a DDoS on www.w3.org when a new DTD  
>> that isn't in the catalog is published
>
> How often new DTD will be published? I suppose at much lower rate  
> then browsers have to be updated for security bugs.

Using security updates to deliver something other than security  
updates would violate the trust of users who expect security updates  
not to modify program behavior in any way than isn't a security fix.  
Violating the trust is bad, because then users will be less eager to  
apply security updates in a timely manner.

>> * www.w3.org as a single point of failure when a DTD isn't in the  
>> catalog
>
> Why W3C DTDs should not be in catalog?

A browser whose catalog predates a given W3C DTD won't have it in the  
catalog.

(Also, a comprehensive catalog isn't exactly small. Consider the  
browser download footprint and mobile browser flash footprint.)

>> * parsing performance other than network
>
> There is nothing which prevents XML parser to load precompiled  
> memory dump of entity definition when reaching known DTD.

There's the opportunity cost. Browser developers could be working on  
more attractive features or performance for something more important  
than entities in XML instead of spending time on trying to make an  
inherently inefficient macro scheme perform better.

>> * colons in PI targets
>
> Does is this really problem or just virtual problem?

It's a real problem in the sense that there's real (XHTML+MathML)  
content out there that references a DTD that has colons in PIs.

> Is there any parser which really fails on such construct?

Any XML parser that makes violations of Namespaces in XML fatal.

>> * ungraceful behavior in existing Gecko and WebKit instances
>
> Not sure what you mean.

A fatal error.

>> * the fact that the very reason why the XML spec doesn't require  
>> XML processors to process the DTD is to cater for browsers (see http://www.xml.com/axml/testaxml.htm) 
>> , so failing to exercise the opportunity not to process DTDs would  
>> be defeat the intent of the XML spec
>> ?
>
> Without DTDs it will not be possible to use entities except five  
> predefined in XML serialization. This is perfectly fine with me. But  
> if there is real push for entities in XML serialization there are  
> two possibilities:
>
> - use current XML features combined with XML catalogs to get working  
> solution

Doing full and expanding catalogs is not good. Standardizing Gecko's  
pseudo-DTD catalog could work.
http://groups.google.com/group/mozilla.dev.tech.mathml/browse_thread/thread/e7f7efbb5e161348/8d64a935fe730de7

> - create XML V.next which will predefine HTML + MathML + ISO  
> entities so there will be no need for entity definitions in DTD
>
> I think that later solution is much better but it is not realistic  
> now.

I guess at some point the need for an "XML5" becomes so pressing that  
it actually gets done. At that point, there'd be an opportunity to  
kill DTDs, make error handling non-Draconian, predefine all the HTML5  
entities, make the Name production allow anything except '>', '=' or  
whitespace, make text content allow all of Unicode, etc.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Tuesday, 22 July 2008 10:50:20 UTC