Re: Change proposal for ISSUE-103 from Julian Reschke on 2010-06-06 (public-html@w3.org from June 2010)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sun, 06 Jun 2010 11:11:39 +0200
To: Ian Hickson <ian@hixie.ch>
CC: public-html@w3.org
Message-ID: <4C0B664B.1080307@gmx.de>
On 06.06.2010 03:12, Ian Hickson wrote:
>
> ISSUE-103
> =========
>
> SUMMARY
>
> Defer to the XML specification.
>
>
> RATIONALE
>
> HTML and XML differ in the requirements for escaping the value of
> "srcdoc". While the HTML specification defines the HTML syntax, it doesn't
> define the XML syntax and therefore should not attempt to redundantly
> repeat the rules in the XML specification. This is especially important in

Doesn't compute.

We're talking about two paragraphs here:

"Note: In the HTML syntax, authors need only remember to use U+0022 
QUOTATION MARK characters (") to wrap the attribute contents and then to 
escape all U+0022 QUOTATION MARK (") and U+0026 AMPERSAND (&) 
characters, and to specify the sandbox  attribute, to ensure safe 
embedding of content.

Note: Due to restrictions of the XML syntax, in XML a number of other 
characters need to be escaped also to ensure correctness."

If the HTML specification defines the HTML syntax, why is the first 
paragraph needed then?

Both are just non-normative advice, and there's no good reason why the 
same rules shouldn't apply to both.

> this case because the rules are remarkably complicated, and it is highly
> likely that any description we include here will be incomplete or
> misleading in some way.

Ah, <http://en.wikipedia.org/wiki/Fear,_uncertainty_and_doubt> :-)

There's a proposal on the table. Did you spot a problem in it?

Anyway, if it makes it into the spec, it will be subject to spec review 
as any other text in the spec (which might be "incomplete or misleading" 
as well).

> The target audience for this section is people writing serialisers for
> HTML and XML documents. (The target audience isn't people who hand-author
> their documents, since srcdoc="" isn't especially useful for hand
> authoring -- the whole point is to provide defense in depth for documents
> that embed content automatically.) People writing HTML serialisers can

Incorrect.

This is not about serializing XML or HTML. This is about taking a string 
that already *is* serialized XML/HTML and putting it into an attribute 
value.

(Putting markup into attribute values is a very bad design pattern, and 
it's only because of this that we have this discussion in the first 
place; there's a related issue -- 
<http://www.w3.org/html/wg/tracker/issues/100> -- which is about 
removing this "feature" which of course would make *this* discussion 
irrelevant).


> legitimately be expected to be writing their software using string
> concatenation, despite this being a suboptimal design: this indeed is the
> common way for such software to be written. For these authors, therefore,
> it is helpful for the specification to describe the HTML escaping rules
> relevant here. People writing XHTML tools, though, are much more likely to
> be using tool chains that already have XML serialisers, and thus they are
> less likely to need guidance as to what to escape -- the authors of XML
> serialisers are more likely to use the XML specification than the HTML
> specification in writing their software.
>
> The paragraph should therefore have the following qualities:
>
>   * It should say that the situation with XML is more complex, because the
>     situation in XML is more complex.

No problem with that, as long as this doesn't result in exaggerations or 
pure hand-waving.

>   * It should not attempt to describe these rules, because the rules are
>     long, and not useful to readers of this specification.

The rules aren't long. We're not talking about the *whole* set of rules, 
only what needs to be escaped when moving an already syntactically 
correct XML fragment into an attribute value.

The proposal is:

"Note: Due to restrictions of the XML syntax, in XML the U+003C
LESS-THAN SIGN (<) needs be escaped as well. Also, XML's whitespace
characters -- U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED (LF),
U+000D CARRIAGE RETURN (CR) and U+0020 SPACE -- need to be escaped in
order to prevent attribute-value normalization ([XML], Section 3.3.3)." 
(see <http://lists.w3.org/Archives/Public/public-html/2010Mar/0431.html>)

>   * It should defer to and reference the XML specification, because that
>     is the relevant specification for this issue.

This is true for both the HTML and XML rules, so the conclusion would be 
to add specific references to both paragraphs.

> DETAILS
>
> Change the following paragraph:
>
>     Due to restrictions of the XML syntax, in XML a number of other
>     characters need to be escaped also to ensure correctness.
>
> ...to:
>
>     In the XHTML syntax, a number of other characters also need to be
>     escaped, as defined by the XML specification. [XML]

That would be much more helpful if you cited the actual section 
containing the answer.

>...

Best regards, Julian
Received on Sunday, 6 June 2010 09:18:55 UTC