[whatwg] Getting .innerHTML in XML well-formedness issues

On Fri, 15 Jun 2007 01:02:49 +0200, Ian Hickson <ian at hixie.ch> wrote:

> On Fri, 27 Oct 2006, Simon Pieters wrote:
>>
>> The spec says that getting .innerHTML in XML must return a
>> namespace-well-formed XML representation of the element or document. [1]
>> But what should happen when the DOM isn't namespace-well-formed and it
>> can't be fixed by namespace prefix rewriting?
>>
>> E.g., when the DOM contains any of the following?:
>>
>>   * A ProcessingInstruction node containing ?>
>>   * A Comment node containing -- (or ending with -)
>>   * A CDATASection node containing ]]>
>> [ * A processing instruction with the target "xml"
>>     (in any case combination)? ]
>> [ * Or colons in local names or processing instruction targets? ]
>
> ...or a DOCTYPE whose publicId or systemId parts contain both " and '
> characters.
>
> I've made the spec say that you raise an exception in those six cases.
>
>
>> DOM3 Core says that they "must generate a fatal error during
>> serialization" (or, for the CDATA case, "the cdata section must be
>> splitted before the serialization"). Does that mean raise a SYNTAX_ERR
>> exception?
>
> I used INVALID_STATE_ERR, not SYNTAX_ERR (it's the reverse of a syntax
> error).
>
>
>> What about when there are illegal characters?
>
> The DOM doesn't let you create those cases.

Sure it does. e.g. the DOM allows e.g. control characters in various  
places that XML doesn't. I haven't looked into every production in XML to  
see if it differs from the DOM, but I guess you can spec something that is  
catch-all, like "if the node contains a character that isn't allowed  
according to the corresponding XML production" or some such... though  
listing all cases is nicer.

> I'm tempted to allow the serialisation of PIs with the name "xml", and to
> allow the splitting of CDATA blocks with ]]>. Opinions?

The former wouldn't result in well-formed XML, but the latter is cool.

-- 
Simon Pieters

Received on Tuesday, 10 July 2007 08:09:45 UTC