W3C home > Mailing lists > Public > public-html@w3.org > March 2010

Re: Change proposal for issue 103, was: ISSUE-103 change proposal

From: Shelley Powers <shelley.just@gmail.com>
Date: Thu, 18 Mar 2010 15:45:31 -0500
Message-ID: <643cc0271003181345m25f8c747qf6ae2edf522cd548@mail.gmail.com>
To: Julian Reschke <julian.reschke@gmx.de>
Cc: public-html@w3.org
On Thu, Mar 18, 2010 at 7:09 AM, Julian Reschke <julian.reschke@gmx.de>wrote:

> On 18.03.2010 11:47, Philip Taylor wrote:
>
>> Anne van Kesteren wrote:
>>
>>> On Thu, 18 Mar 2010 11:26:48 +0100, Julian Reschke
>>> <julian.reschke@gmx.de> wrote:
>>>
>>>> Replace the last sentence by:
>>>>
>>>> "Note: Due to restrictions of the XML syntax, in XML the U+003C
>>>> LESS-THAN SIGN (<) needs be escaped as well."
>>>>
>>>
>>> That seems incomplete. The sequence ]]> comes to mind.
>>>
>>
>> That's not an issue in attribute values, as far as I'm aware.
>>
>> But in attribute values, U+000D and U+000A and U+0009 must be escaped
>> too. (Depending on DTD you might also need to escape any leading or
>> trailing U+0020 and at least one of any adjacent pair of U+0020s, I
>> think.)
>>
>
> Ah, good catch. Updated proposal below.
>
> BR, Julian
>
>
> -- snip --
>
> SUMMARY
>
> Specification is needlessly vague about XML escaping requirements when
> discussing iframe/@srcdoc.
>
> RATIONALE
>
> Spec should properly balance considerations for text/html and
> application/xhtml+xml. If the requirements are spelled out for the former
> the same should be done for the latter.
>
> DETAILS
>
> Spec currently says:
>
> "Note: In the HTML syntax, authors need only remember to use U+0022
> QUOTATION MARK characters (") to wrap the attribute contents and then to
> escape all U+0022 QUOTATION MARK (") and U+0026 AMPERSAND (&) characters,
> and to specify the sandbox  attribute, to ensure safe embedding of content.
>
> Note: Due to restrictions of the XML syntax, in XML a number of other
> characters need to be escaped also to ensure correctness."
>
> Replace the last sentence by:
>
> "Note: Due to restrictions of the XML syntax, in XML the U+003C LESS-THAN
> SIGN (<) needs be escaped as well. Also, XML's whitespace characters --
> U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED (LF), U+000D CARRIAGE
> RETURN (CR) and U+0020 SPACE -- need to be escaped in order to prevent
> attribute-value normalization ([XML], Section 3.3.3)."
>
>
> IMPACT
>
> 1. Positive Effects
>
> More clarity about the XML syntax; equal treatment of both formats.
>
> 2. Negative Effects
>
> Repeats information that already is defined somewhere else, but this
> applies to the paragraph about HTML as well.
>
> 3. Conformance Classes Changes
>
> None.
>
> 4. Risks
>
> The statement might not be totally accurate, in which case we can use the
> regular review and bug fixing process to get it right. That being said I
> believe it is accurate, as it's not about encoding characters in XML in
> general, but just about *additional* requirements for attribute values.
>
> REFERENCES
>
> None.
>
>
>
Thanks for writing this proposal, Julian, when I had to drop it.

Shelley
Received on Thursday, 18 March 2010 20:46:04 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:59 UTC