W3C home > Mailing lists > Public > www-dom@w3.org > January to March 2014

Re: [Bug 25168] New: Should XML Serialization be allowed to produce invalid XML?

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Thu, 27 Mar 2014 01:17:38 +0100
To: bugzilla@jessica.w3.org
Cc: www-dom@w3.org
Message-ID: <0pq6j91umh1bbgdb5qv4lsbdvhknebhna8@hive.bjoern.hoehrmann.de>
* bugzilla@jessica.w3.org wrote:
>https://www.w3.org/Bugs/Public/show_bug.cgi?id=25168

>Today in an HTML document,
>  createElement("first:last")
>Will create an HTMLElement node with prefix = null, and localName =
>"first:last".
>
>An XML Serialization according to the spec today (and matching IE/Firefox and
>soon Chrome) will generate the following invalid XML:
>  <first:last xmlns="http://www.w3.org/1999/xhtml"/>
>
>This is invalid (when round-tripped through DOMParser) because the prefix
>"first" is not defined. The XML parser does not know that "first:last" should
>be interpreted as a localName only.

It would be good to use the proper terminology here. There is nothing
wrong with using a colon in element type names under the rules of XML
1.0 or the DOM Level 1 methods as originally defined. The constraint
here is a namespace well-formedness requirement. DOM Level 3 L&S has a
switch to enable or disable namespace processing. That would allow the
processor to read such documents.

>There are two ways to avoid serializing invalid XML fragments:
>1) Not allow the Serializer to emit localNames (for elements or attributes)
>that would not have been possible to create in an XML environment. This would
>involve changing the actual element or attribute localNames which would have a
>web compatibility problem. For example, "first:last" could be Serialized as
>"first_last" instead. (Underscore is preferred to a hyphen since hyphens are
>the character delineating a Custom Element for a web component.)

Silent data corruption like this should be out of the question except
under the most extraordinary circumstances. I do not see those here.

>2) Fail to serialize on potential invalid output.
>
>#2 above seems like it would have too great a potential to break web
>compatibility--it's a pretty big hammer to apply to the API in the event of a
>validation issue. Though it could be useful for programmatic validation of a
>DOM. Personally, I don't prefer this option.

It is not likely that much otherwise functional web content relies on
serialisation succeeding in the case under consideration. As you note,
people would run into trouble reading the content back in in ordinary
XML processing environments.

>If, in fact, we think that the XMLSerializer should always produce valid XML,
>then I would prefer an escaping approach to minimize back-compat on calling
>APIs. Otherwise, we should agree to allow the serializer to produce invalid XML
>and have that understanding.

Again, colons in element type names are fine, it's not a question of
validity and not a question of well-formedness, but a question of XML
namespace requirements, which have been bolted on XML rules. There is
not much of a reason why there should be no option to produce namespace-
illformed content.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Thursday, 27 March 2014 00:18:05 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 20 October 2015 10:46:22 UTC