Re: Working Group Decision on ISSUE-103 srcdoc-xml-escaping

On Wed, 13 Oct 2010 12:22:48 +0200, Sam Ruby <rubys@intertwingly.net>  
wrote:

> Here is the decision.  The chairs made an effort to explicitly address  
> all arguments presented in the Change Proposals on this topic in  
> addition to arguments posted as objections in the poll.
>
> *** Question before the Working Group ***
>
> The current HTML5 draft defers to the XML specification for the  
> documentation of the srcdoc attribute.  Some Working Group members have  
> questioned whether or not it would be helpful to spell out the  
> characters that need to be escaped.
>
> This scope of this decision is specifically on how the existing  
> attribute is to be specified in the HTML5 specification.  A section  
> below details what arguments were not considered, a number of which were  
> not considered due to scope reasons.
>
> == Uncontested observations:
>
> * Escaping in XML is more complex
> * Target audience is for people who are authoring serializers
> * Referencing the XML specification makes sense
>
> None of these were decisive.  There were people who supported either of  
> these proposals even after taking these facts into consideration.
>
> == Summary of Arguments:
>
> Once we put aside the uncontested observations, what is left is whether  
> or not the rules are long or useful.
>
> For Documenting: this proposal contains two sentences containing 54  
> words, and includes a specific reference to the XML specification, and  
> asserts that the addition of this note would provide "more clarity".
>
> For Deferring: this proposal contains a single sentence containing 23  
> words, contains a generic reference to the XML specification, and  
> asserts that the target audience would be "less likely" to need this  
> guidance.
>
> In the context of the HTML5 specification, an addition of 31 words is  
> not found to be long, and a specific reference to be more helpful than a  
> general reference.
>
> "Less likely" was found to be a weak argument.  Unlike "parity" which  
> was eliminated as a consideration (see below) "more clarity" was also  
> found to be a weak argument, just a slightly less weak one than "less  
> likely".
>
> *** Decision of the Working Group ***
>
> Therefore, the HTML Working Group hereby adopts the Change Proposal to  
> document the characters that must be escaped for XML in the srcdoc  
> attribute.  Of the two Change Proposals before us, this one has drawn  
> the weaker objections.
>
> Bug 8806 is to be reopened and marked as WGDecision.
>
> == Next Steps ==
>
> The editor of the HTML5 specification is directed to make this change.
>
> == Appealing this Decision ==
>
> If anyone strongly disagrees with the content of the decision and would  
> like to raise a Formal Objection, they may do so at this time. Formal  
> Objections are reviewed by the Director in consultation with the Team.  
> Ordinarily, Formal Objections are only reviewed as part of a transition  
> request.
>
> == Revisiting this Issue ==
>
> As this issue is narrow, the documentation for XML is both available and  
> stable, and ample opportunities have been provided for people to review  
> the various proposals, it is difficult to imagine cases where new  
> information could be provided which would cause this issue to be  
> revisited.

The new text says that U+0020 needs to be escaped.

    <p class="note">Due to restrictions of <span>the XML syntax</span>,
-  in XML a number of other characters need to be escaped also to
-  ensure correctness.</p>
+  in XML the U+003C LESS-THAN SIGN character (&lt;) needs to be
+  escaped as well. In order to prevent <a
+  href="http://www.w3.org/TR/REC-xml/#AVNormalize">attribute-value
+  normalization</a>, XML's whitespace characters &mdash; U+0009
+  CHARACTER TABULATION (HT), U+000A LINE FEED (LF), U+000D CARRIAGE
+  RETURN (CR) and U+0020 SPACE &mdash; also need to be escaped. <a
+  href="#refsXML">[XML]</a></p>

My reading of the XML spec suggests space does not need to be escaped.

http://www.w3.org/TR/REC-xml/#AVNormalize

"For a white space character (#x20, #xD, #xA, #x9), append a space  
character (#x20) to the normalized value."

i.e. a literal space and an escaped space results in the same thing.

The paragraph "If the attribute type is not CDATA, then the XML processor  
MUST further process the normalized attribute value by discarding any  
leading and trailing space (#x20) characters, and by replacing sequences  
of space (#x20) characters by a single space (#x20) character." does not  
apply since srcdoc is a CDATA attribute.

Should I file a bug report?


> However, as the markup syntax was not considered to be within scope of
> this issue, concrete suggestions for alternate syntaxes, in the form of  
> bug reports, continue to be welcome.
>
> == Arguments not considered ==
>
> a) Whether or not there is a use case for this attribute
>
>     That is the subject of ISSUE-100
>
> b) Whether an alternate syntax would be better for this use case
>
>     No concrete proposal has been put forward
>
> c) Balancing considerations for text/html and application/xhtml+xml.
>
>     The spec currently contains significant descriptions of text/html at
>     a syntax level for which "balanced" descriptions is not provided.
>     No evidence was presented "balancing" is necessary or represents the
>     consensus of the working group.
>
> d) The note may be incomplete or misleading.
>
>     The spec currently contains numerous notes that may be incomplete or
>     misleading.  No evidence was provided that all such helpful
>     descriptions should all be removed, and the evidence that this
>     particular case is to be considered "remarkably complicated" was not
>     found to be strong.  Anyone who has any reason why this or any other
>     note is "incomplete or misleading" are encouraged to file bug
>     reports.
>
> e) CDATA attributes
>
>     No concrete proposal was put forward.  If there the description is
>     incomplete or misleading, bug reports should be filed.
>
> f) "define the extra rules that applies to HTML DOM equivalent XHTML"
>
>     That's the subject of a separate document.
>


-- 
Simon Pieters
Opera Software

Received on Thursday, 14 October 2010 09:36:23 UTC