Re: the document character set for text/thml serialization from Julian Reschke on 2007-09-10 (public-html@w3.org from September 2007)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Mon, 10 Sep 2007 10:44:25 +0200
To: Robert Burns <rob@robburns.com>
CC: Anne van Kesteren <annevk@opera.com>, HTML WG <public-html@w3.org>
Message-ID: <46E503E9.8090408@gmx.de>

Robert Burns wrote:
> I think Julian's question is not limited to serialization. The issue is 
> what meaning these characters have whether inserted into the DOM, or 
> inserted through XML, or inserted through the text/html serialization? 

Correct. As a matter of fact, the fact that it's possible to add illegal 
characters through XML DOM level 1, and then XML serializers either 
create broken XML or throw exceptions later on also has been a source of 
frustration for many programmers.

> That in itself is an interoperability problem. If HTML doesn't specify 
> this and Unicode doesn't specify this then is there any specification we 
> can point to that would tell UAs what to do and authors what to expect?

Right.

> So we can't just say that the DOM supports it so the serialization 
> should support it because we're in the process of specifying the HTML5 
> DOM and one of the HTML5 serializations. Incidentally I've also added 
> this issue to the serialization differences wiki page. I included  XML 
> 1.1 in that table because, though Julian says it's a failure, the only 
> requirement changes as far as I can see, relate to these C0 and C1 
> control characters and there meaning and serialization.
 > ...

The failure is largely about interop. The are almost no benefits of XML 
1.1 over 1.0, but the transition is so expensive that as far as I can 
tell, it just hasn't occurred.

Best regards, Julian

Received on Monday, 10 September 2007 08:44:43 UTC