- From: Sam Ruby <rubys@intertwingly.net>
- Date: Sat, 2 Dec 2006 20:47:15 -0500
On 12/2/06, Henri Sivonen <hsivonen at iki.fi> wrote: > On Dec 2, 2006, at 18:24, Sam Ruby wrote: > > > It would not be wise for HTML5 to limit itself to the more constrained > > character set of XML. In particular, the form feed character is > > pretty popular, BTW, I copy and pasted the wrong table. The characters I mentioned were discouraged (and include such things as Microsoft smart quotes mislabeled as iso-8859-1). The actual allowed set in XML 1.0 is as follows: #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] For XML 1.1 the list is as follows: [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] > > This is yet another case where "take HTML5, read it into a DOM, and > > serialize it as XML, and voil?: you have valid XHTML" doesn't work. > > What I am advocating is making sure that *conforming* HTML5 documents > can be serialized as XHTML5 without dataloss. Then you will also need to disallow newlines in attribute values. In any case, I understand the desire; my read is that the WG's desire for backwards compatibility is higher. Limiting the character set to the allowable XML 1.1 character set should not be a problem for backwards compatibility purposes. - Sam Ruby
Received on Saturday, 2 December 2006 17:47:15 UTC