- From: Shelley Powers <shelley.just@gmail.com>
- Date: Thu, 18 Mar 2010 15:45:31 -0500
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: public-html@w3.org
- Message-ID: <643cc0271003181345m25f8c747qf6ae2edf522cd548@mail.gmail.com>
On Thu, Mar 18, 2010 at 7:09 AM, Julian Reschke <julian.reschke@gmx.de>wrote: > On 18.03.2010 11:47, Philip Taylor wrote: > >> Anne van Kesteren wrote: >> >>> On Thu, 18 Mar 2010 11:26:48 +0100, Julian Reschke >>> <julian.reschke@gmx.de> wrote: >>> >>>> Replace the last sentence by: >>>> >>>> "Note: Due to restrictions of the XML syntax, in XML the U+003C >>>> LESS-THAN SIGN (<) needs be escaped as well." >>>> >>> >>> That seems incomplete. The sequence ]]> comes to mind. >>> >> >> That's not an issue in attribute values, as far as I'm aware. >> >> But in attribute values, U+000D and U+000A and U+0009 must be escaped >> too. (Depending on DTD you might also need to escape any leading or >> trailing U+0020 and at least one of any adjacent pair of U+0020s, I >> think.) >> > > Ah, good catch. Updated proposal below. > > BR, Julian > > > -- snip -- > > SUMMARY > > Specification is needlessly vague about XML escaping requirements when > discussing iframe/@srcdoc. > > RATIONALE > > Spec should properly balance considerations for text/html and > application/xhtml+xml. If the requirements are spelled out for the former > the same should be done for the latter. > > DETAILS > > Spec currently says: > > "Note: In the HTML syntax, authors need only remember to use U+0022 > QUOTATION MARK characters (") to wrap the attribute contents and then to > escape all U+0022 QUOTATION MARK (") and U+0026 AMPERSAND (&) characters, > and to specify the sandbox attribute, to ensure safe embedding of content. > > Note: Due to restrictions of the XML syntax, in XML a number of other > characters need to be escaped also to ensure correctness." > > Replace the last sentence by: > > "Note: Due to restrictions of the XML syntax, in XML the U+003C LESS-THAN > SIGN (<) needs be escaped as well. Also, XML's whitespace characters -- > U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED (LF), U+000D CARRIAGE > RETURN (CR) and U+0020 SPACE -- need to be escaped in order to prevent > attribute-value normalization ([XML], Section 3.3.3)." > > > IMPACT > > 1. Positive Effects > > More clarity about the XML syntax; equal treatment of both formats. > > 2. Negative Effects > > Repeats information that already is defined somewhere else, but this > applies to the paragraph about HTML as well. > > 3. Conformance Classes Changes > > None. > > 4. Risks > > The statement might not be totally accurate, in which case we can use the > regular review and bug fixing process to get it right. That being said I > believe it is accurate, as it's not about encoding characters in XML in > general, but just about *additional* requirements for attribute values. > > REFERENCES > > None. > > > Thanks for writing this proposal, Julian, when I had to drop it. Shelley
Received on Thursday, 18 March 2010 20:46:04 UTC