- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 18 Mar 2010 13:09:36 +0100
- To: Philip Taylor <pjt47@cam.ac.uk>
- CC: Anne van Kesteren <annevk@opera.com>, public-html@w3.org
On 18.03.2010 11:47, Philip Taylor wrote:
> Anne van Kesteren wrote:
>> On Thu, 18 Mar 2010 11:26:48 +0100, Julian Reschke
>> <julian.reschke@gmx.de> wrote:
>>> Replace the last sentence by:
>>>
>>> "Note: Due to restrictions of the XML syntax, in XML the U+003C
>>> LESS-THAN SIGN (<) needs be escaped as well."
>>
>> That seems incomplete. The sequence ]]> comes to mind.
>
> That's not an issue in attribute values, as far as I'm aware.
>
> But in attribute values, U+000D and U+000A and U+0009 must be escaped
> too. (Depending on DTD you might also need to escape any leading or
> trailing U+0020 and at least one of any adjacent pair of U+0020s, I think.)
Ah, good catch. Updated proposal below.
BR, Julian
-- snip --
SUMMARY
Specification is needlessly vague about XML escaping requirements when
discussing iframe/@srcdoc.
RATIONALE
Spec should properly balance considerations for text/html and
application/xhtml+xml. If the requirements are spelled out for the
former the same should be done for the latter.
DETAILS
Spec currently says:
"Note: In the HTML syntax, authors need only remember to use U+0022
QUOTATION MARK characters (") to wrap the attribute contents and then to
escape all U+0022 QUOTATION MARK (") and U+0026 AMPERSAND (&)
characters, and to specify the sandbox attribute, to ensure safe
embedding of content.
Note: Due to restrictions of the XML syntax, in XML a number of other
characters need to be escaped also to ensure correctness."
Replace the last sentence by:
"Note: Due to restrictions of the XML syntax, in XML the U+003C
LESS-THAN SIGN (<) needs be escaped as well. Also, XML's whitespace
characters -- U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED (LF),
U+000D CARRIAGE RETURN (CR) and U+0020 SPACE -- need to be escaped in
order to prevent attribute-value normalization ([XML], Section 3.3.3)."
IMPACT
1. Positive Effects
More clarity about the XML syntax; equal treatment of both formats.
2. Negative Effects
Repeats information that already is defined somewhere else, but this
applies to the paragraph about HTML as well.
3. Conformance Classes Changes
None.
4. Risks
The statement might not be totally accurate, in which case we can use
the regular review and bug fixing process to get it right. That being
said I believe it is accurate, as it's not about encoding characters in
XML in general, but just about *additional* requirements for attribute
values.
REFERENCES
None.
Received on Thursday, 18 March 2010 12:10:14 UTC