- From: Maciej Stachowiak <mjs@apple.com>
- Date: Wed, 24 Mar 2010 02:54:03 -0700
- To: Philip Taylor <pjt47@cam.ac.uk>
- Cc: public-html@w3.org
On Mar 24, 2010, at 2:29 AM, Philip Taylor wrote: > Maciej Stachowiak wrote: >> On Mar 22, 2010, at 5:14 PM, Ian Hickson wrote: >>> On Thu, 18 Mar 2010, Philip Taylor wrote: >>>> Anne van Kesteren wrote: >>>>> On Thu, 18 Mar 2010 11:26:48 +0100, Julian Reschke <julian.reschke@gmx.de >>>>> > >>>>> wrote: >>>>>> Replace the last sentence by: >>>>>> >>>>>> "Note: Due to restrictions of the XML syntax, in XML the U+003C >>>>>> LESS-THAN >>>>>> SIGN (<) needs be escaped as well." >>>>> >>>>> That seems incomplete. The sequence ]]> comes to mind. >>>> >>>> That's not an issue in attribute values, as far as I'm aware. >>>> >>>> But in attribute values, U+000D and U+000A and U+0009 must be >>>> escaped too. >>>> (Depending on DTD you might also need to escape any leading or >>>> trailing U+0020 >>>> and at least one of any adjacent pair of U+0020s, I think.) >>> >>> This discussion is exactly the reason why including this in the >>> spec is a >>> bad idea. >> Julian & Philip, how confident are you that the full set of >> characters that need escaping is U+003C, U+000D, U+000A, U+0009 and >> U+0020? Does & need to be escaped? > > It needs these characters "as well" as the ones already mentioned in > the previous paragraph in the spec (quotes and &s). > > I can't think of any other characters that have particularly special > behaviour, but what is the purpose of this note? If it is aimed at > people writing software that emits XML syntax fragments given an > arbitrary string of Unicode codepoints, attempting to tell them > everything they need to know in order to serialise safely (i.e. > without allowing the content to break their entire page), then it > would probably also have to say that U+FFFE and U+FFFF and other > characters in U+0000..U+001F aren't allowed, and that they must be > encoded in the same character encoding as the rest of the document, > etc. > > It seems silly to duplicate the XML spec in that much detail here - > if someone's correctly implementing XML then they should already > have an XML serialiser that deals with all these issues, and > repeating the information here will be a source of bugs and a waste > of time. > > If it's aimed at people writing XHTML by hand, telling them about > common things to be careful of, it probably doesn't need to bother > mentioning U+0020 because (as far as I can see) that only matters in > obscure cases when the DTD has set srcdoc to be non-CDATA. But > <iframe srcdoc> is not a useful feature when writing markup by hand > - the use cases were things like sandboxing untrusted user comments, > and the whole point is that people will write software to serialise > these values, so it's not useful to give advice intended for hand- > authoring. What spec change (if any) would you recommend on this issue? (I'm not sure from the above if you are arguing for a detailed note, a shorter but partially incomplete note, no note, or something else.) Regards, Maciej
Received on Wednesday, 24 March 2010 09:54:36 UTC