W3C home > Mailing lists > Public > public-html@w3.org > March 2010

Re: Change proposal for issue 103, was: ISSUE-103 change proposal

From: Maciej Stachowiak <mjs@apple.com>
Date: Tue, 23 Mar 2010 20:43:09 -0700
Cc: Philip Taylor <pjt47@cam.ac.uk>, Anne van Kesteren <annevk@opera.com>, public-html@w3.org
Message-id: <48A5559B-0492-46F7-B9E8-A1BB4FDF0E48@apple.com>
To: Julian Reschke <julian.reschke@gmx.de>

On Mar 18, 2010, at 5:09 AM, Julian Reschke wrote:

> On 18.03.2010 11:47, Philip Taylor wrote:
>> Anne van Kesteren wrote:
>>> On Thu, 18 Mar 2010 11:26:48 +0100, Julian Reschke
>>> <julian.reschke@gmx.de> wrote:
>>>> Replace the last sentence by:
>>>>
>>>> "Note: Due to restrictions of the XML syntax, in XML the U+003C
>>>> LESS-THAN SIGN (<) needs be escaped as well."
>>>
>>> That seems incomplete. The sequence ]]> comes to mind.
>>
>> That's not an issue in attribute values, as far as I'm aware.
>>
>> But in attribute values, U+000D and U+000A and U+0009 must be escaped
>> too. (Depending on DTD you might also need to escape any leading or
>> trailing U+0020 and at least one of any adjacent pair of U+0020s, I  
>> think.)
>
> Ah, good catch. Updated proposal below.

Thanks for the Change Proposal. Recorded:

http://dev.w3.org/html5/status/issue-status.html#ISSUE-0103

Regards,
Maciej

>
> BR, Julian
>
> -- snip --
>
> SUMMARY
>
> Specification is needlessly vague about XML escaping requirements  
> when discussing iframe/@srcdoc.
>
> RATIONALE
>
> Spec should properly balance considerations for text/html and  
> application/xhtml+xml. If the requirements are spelled out for the  
> former the same should be done for the latter.
>
> DETAILS
>
> Spec currently says:
>
> "Note: In the HTML syntax, authors need only remember to use U+0022  
> QUOTATION MARK characters (") to wrap the attribute contents and  
> then to escape all U+0022 QUOTATION MARK (") and U+0026 AMPERSAND  
> (&) characters, and to specify the sandbox  attribute, to ensure  
> safe embedding of content.
>
> Note: Due to restrictions of the XML syntax, in XML a number of  
> other characters need to be escaped also to ensure correctness."
>
> Replace the last sentence by:
>
> "Note: Due to restrictions of the XML syntax, in XML the U+003C LESS- 
> THAN SIGN (<) needs be escaped as well. Also, XML's whitespace  
> characters -- U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED  
> (LF), U+000D CARRIAGE RETURN (CR) and U+0020 SPACE -- need to be  
> escaped in order to prevent attribute-value normalization ([XML],  
> Section 3.3.3)."
>
> IMPACT
>
> 1. Positive Effects
>
> More clarity about the XML syntax; equal treatment of both formats.
>
> 2. Negative Effects
>
> Repeats information that already is defined somewhere else, but this  
> applies to the paragraph about HTML as well.
>
> 3. Conformance Classes Changes
>
> None.
>
> 4. Risks
>
> The statement might not be totally accurate, in which case we can  
> use the regular review and bug fixing process to get it right. That  
> being said I believe it is accurate, as it's not about encoding  
> characters in XML in general, but just about *additional*  
> requirements for attribute values.
>
> REFERENCES
>
> None.
>
>
Received on Wednesday, 24 March 2010 03:43:44 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:16:00 UTC