[whatwg] Getting .innerHTML in XML well-formedness issues from Henri Sivonen on 2006-10-28 (public-whatwg-archive@w3.org from October 2006)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Sat, 28 Oct 2006 16:12:46 +0300
Message-ID: <893B2B77-448C-4B93-B562-EE76105B661A@iki.fi>

On Oct 28, 2006, at 13:35, Anne van Kesteren wrote:

> On Fri, 27 Oct 2006 22:09:20 +0200, Simon Pieters  
> <zcorpan at hotmail.com> wrote:
>> DOM3 Core says that they "must generate a fatal error during  
>> serialization" (or, for the CDATA case, "the cdata section must be  
>> splitted before the serialization"). Does that mean raise a  
>> SYNTAX_ERR exception?
>
> One idea would be to update DOM Level 3 Core to make sure you can  
> never get documents that are not serializable. I don't really know  
> if that's feasible though.

In that case, the HTML parsing section would need to be revised to  
forbid element and attributes names that are not conforming XML 1.0 +  
Namespaces local names, to forbid non-XML characters in character  
data and attribute values and to forbid "--" in comments. Personally,  
I'd welcome such a change, since it would truly make text/html an  
alternative infoset serialization for a subset of XML 1.0.

Non-browsers that use XML tools to process HTML5 will have to enforce  
those constraints anyway in one way or another. Current text/html  
browsers don't, though.

Or did you mean that browsers would not enforce XML 1.0  
serializability if the DOM was created by parsing text/html? Would  
you then throw an exception if a subtree is imported from such a DOM  
into a DOM that enforces serializability?

The exposure of CDATA sections in the DOM is, IMO, a design flaw in  
the DOM. I wouldn't mind serializing them as normal character data.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/

Received on Saturday, 28 October 2006 06:12:46 UTC