W3C home > Mailing lists > Public > public-html@w3.org > March 2010

Re: Change proposal for issue 103, was: ISSUE-103 change proposal

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 18 Mar 2010 13:09:36 +0100
Message-ID: <4BA21800.5080104@gmx.de>
To: Philip Taylor <pjt47@cam.ac.uk>
CC: Anne van Kesteren <annevk@opera.com>, public-html@w3.org
On 18.03.2010 11:47, Philip Taylor wrote:
> Anne van Kesteren wrote:
>> On Thu, 18 Mar 2010 11:26:48 +0100, Julian Reschke
>> <julian.reschke@gmx.de> wrote:
>>> Replace the last sentence by:
>>>
>>> "Note: Due to restrictions of the XML syntax, in XML the U+003C
>>> LESS-THAN SIGN (<) needs be escaped as well."
>>
>> That seems incomplete. The sequence ]]> comes to mind.
>
> That's not an issue in attribute values, as far as I'm aware.
>
> But in attribute values, U+000D and U+000A and U+0009 must be escaped
> too. (Depending on DTD you might also need to escape any leading or
> trailing U+0020 and at least one of any adjacent pair of U+0020s, I think.)

Ah, good catch. Updated proposal below.

BR, Julian

-- snip --

SUMMARY

Specification is needlessly vague about XML escaping requirements when 
discussing iframe/@srcdoc.

RATIONALE

Spec should properly balance considerations for text/html and 
application/xhtml+xml. If the requirements are spelled out for the 
former the same should be done for the latter.

DETAILS

Spec currently says:

"Note: In the HTML syntax, authors need only remember to use U+0022 
QUOTATION MARK characters (") to wrap the attribute contents and then to 
escape all U+0022 QUOTATION MARK (") and U+0026 AMPERSAND (&) 
characters, and to specify the sandbox  attribute, to ensure safe 
embedding of content.

Note: Due to restrictions of the XML syntax, in XML a number of other 
characters need to be escaped also to ensure correctness."

Replace the last sentence by:

"Note: Due to restrictions of the XML syntax, in XML the U+003C 
LESS-THAN SIGN (<) needs be escaped as well. Also, XML's whitespace 
characters -- U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED (LF), 
U+000D CARRIAGE RETURN (CR) and U+0020 SPACE -- need to be escaped in 
order to prevent attribute-value normalization ([XML], Section 3.3.3)."

IMPACT

1. Positive Effects

More clarity about the XML syntax; equal treatment of both formats.

2. Negative Effects

Repeats information that already is defined somewhere else, but this 
applies to the paragraph about HTML as well.

3. Conformance Classes Changes

None.

4. Risks

The statement might not be totally accurate, in which case we can use 
the regular review and bug fixing process to get it right. That being 
said I believe it is accurate, as it's not about encoding characters in 
XML in general, but just about *additional* requirements for attribute 
values.

REFERENCES

None.
Received on Thursday, 18 March 2010 12:10:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:05 GMT