- From: Robert Burns <rob@robburns.com>
- Date: Sun, 9 Sep 2007 15:40:13 -0500
- To: Anne van Kesteren <annevk@opera.com>
- Cc: "HTML WG" <public-html@w3.org>
HI Anne, On Sep 9, 2007, at 1:26 PM, Anne van Kesteren wrote: > > On Sun, 09 Sep 2007 18:20:03 +0200, Julian Reschke > <julian.reschke@gmx.de> wrote: >> Anne van Kesteren wrote: >>> On Sun, 09 Sep 2007 16:11:34 +0200, Julian Reschke >>> <julian.reschke@gmx.de> wrote: >>>> We really should answer the question we asked before: why would >>>> it be conforming to include those characters in the first place? >>> I can see a good reason to prohibit U+0000 (and that's done), >>> but what is the reason for making these other characters non- >>> conforming? They are not posing any interoperability problem and >>> are also supported by the DOM. I'm not sure why we should limit >>> the HTML serialization here. >> >> So what's the semantics of these characters when they occur inside >> HTML? What is a recipient supposed to do with them, for instance, >> when they appear inside <p> or a <pre> element? > > They should do the same as whenever someone inserts them through > the DOM. Seems that browsers display some type of placeholder > character: http://software.hixie.ch/utilities/js/live-dom-viewer/?% > 3C!DOCTYPE%20html%3E%3Cscript%3Ew(%22%01%22%20%3D%3D%20%22%5C1%22)% > 3C%2Fscript%3E > > It's not entirely clear to me whether that's in scope of HTML > though. We just need to define the "byte stream -> tree" mapping. > Although maybe it could be part of the rendering chapter, dunno. I think Julian's question is not limited to serialization. The issue is what meaning these characters have whether inserted into the DOM, or inserted through XML, or inserted through the text/html serialization? That in itself is an interoperability problem. If HTML doesn't specify this and Unicode doesn't specify this then is there any specification we can point to that would tell UAs what to do and authors what to expect? So we can't just say that the DOM supports it so the serialization should support it because we're in the process of specifying the HTML5 DOM and one of the HTML5 serializations. Incidentally I've also added this issue to the serialization differences wiki page. I included XML 1.1 in that table because, though Julian says it's a failure, the only requirement changes as far as I can see, relate to these C0 and C1 control characters and there meaning and serialization. Take care, Rob [1]: <http://esw.w3.org/topic/HTML/ SerializationDependentProcessingDifferences#head-325bab981d9fb34bc566af1 2b58e423352491705>
Received on Sunday, 9 September 2007 20:40:28 UTC