- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 18 Mar 2010 13:09:36 +0100
- To: Philip Taylor <pjt47@cam.ac.uk>
- CC: Anne van Kesteren <annevk@opera.com>, public-html@w3.org
On 18.03.2010 11:47, Philip Taylor wrote: > Anne van Kesteren wrote: >> On Thu, 18 Mar 2010 11:26:48 +0100, Julian Reschke >> <julian.reschke@gmx.de> wrote: >>> Replace the last sentence by: >>> >>> "Note: Due to restrictions of the XML syntax, in XML the U+003C >>> LESS-THAN SIGN (<) needs be escaped as well." >> >> That seems incomplete. The sequence ]]> comes to mind. > > That's not an issue in attribute values, as far as I'm aware. > > But in attribute values, U+000D and U+000A and U+0009 must be escaped > too. (Depending on DTD you might also need to escape any leading or > trailing U+0020 and at least one of any adjacent pair of U+0020s, I think.) Ah, good catch. Updated proposal below. BR, Julian -- snip -- SUMMARY Specification is needlessly vague about XML escaping requirements when discussing iframe/@srcdoc. RATIONALE Spec should properly balance considerations for text/html and application/xhtml+xml. If the requirements are spelled out for the former the same should be done for the latter. DETAILS Spec currently says: "Note: In the HTML syntax, authors need only remember to use U+0022 QUOTATION MARK characters (") to wrap the attribute contents and then to escape all U+0022 QUOTATION MARK (") and U+0026 AMPERSAND (&) characters, and to specify the sandbox attribute, to ensure safe embedding of content. Note: Due to restrictions of the XML syntax, in XML a number of other characters need to be escaped also to ensure correctness." Replace the last sentence by: "Note: Due to restrictions of the XML syntax, in XML the U+003C LESS-THAN SIGN (<) needs be escaped as well. Also, XML's whitespace characters -- U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED (LF), U+000D CARRIAGE RETURN (CR) and U+0020 SPACE -- need to be escaped in order to prevent attribute-value normalization ([XML], Section 3.3.3)." IMPACT 1. Positive Effects More clarity about the XML syntax; equal treatment of both formats. 2. Negative Effects Repeats information that already is defined somewhere else, but this applies to the paragraph about HTML as well. 3. Conformance Classes Changes None. 4. Risks The statement might not be totally accurate, in which case we can use the regular review and bug fixing process to get it right. That being said I believe it is accurate, as it's not about encoding characters in XML in general, but just about *additional* requirements for attribute values. REFERENCES None.
Received on Thursday, 18 March 2010 12:10:14 UTC