- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sun, 06 Jun 2010 11:11:39 +0200
- To: Ian Hickson <ian@hixie.ch>
- CC: public-html@w3.org
On 06.06.2010 03:12, Ian Hickson wrote: > > ISSUE-103 > ========= > > SUMMARY > > Defer to the XML specification. > > > RATIONALE > > HTML and XML differ in the requirements for escaping the value of > "srcdoc". While the HTML specification defines the HTML syntax, it doesn't > define the XML syntax and therefore should not attempt to redundantly > repeat the rules in the XML specification. This is especially important in Doesn't compute. We're talking about two paragraphs here: "Note: In the HTML syntax, authors need only remember to use U+0022 QUOTATION MARK characters (") to wrap the attribute contents and then to escape all U+0022 QUOTATION MARK (") and U+0026 AMPERSAND (&) characters, and to specify the sandbox attribute, to ensure safe embedding of content. Note: Due to restrictions of the XML syntax, in XML a number of other characters need to be escaped also to ensure correctness." If the HTML specification defines the HTML syntax, why is the first paragraph needed then? Both are just non-normative advice, and there's no good reason why the same rules shouldn't apply to both. > this case because the rules are remarkably complicated, and it is highly > likely that any description we include here will be incomplete or > misleading in some way. Ah, <http://en.wikipedia.org/wiki/Fear,_uncertainty_and_doubt> :-) There's a proposal on the table. Did you spot a problem in it? Anyway, if it makes it into the spec, it will be subject to spec review as any other text in the spec (which might be "incomplete or misleading" as well). > The target audience for this section is people writing serialisers for > HTML and XML documents. (The target audience isn't people who hand-author > their documents, since srcdoc="" isn't especially useful for hand > authoring -- the whole point is to provide defense in depth for documents > that embed content automatically.) People writing HTML serialisers can Incorrect. This is not about serializing XML or HTML. This is about taking a string that already *is* serialized XML/HTML and putting it into an attribute value. (Putting markup into attribute values is a very bad design pattern, and it's only because of this that we have this discussion in the first place; there's a related issue -- <http://www.w3.org/html/wg/tracker/issues/100> -- which is about removing this "feature" which of course would make *this* discussion irrelevant). > legitimately be expected to be writing their software using string > concatenation, despite this being a suboptimal design: this indeed is the > common way for such software to be written. For these authors, therefore, > it is helpful for the specification to describe the HTML escaping rules > relevant here. People writing XHTML tools, though, are much more likely to > be using tool chains that already have XML serialisers, and thus they are > less likely to need guidance as to what to escape -- the authors of XML > serialisers are more likely to use the XML specification than the HTML > specification in writing their software. > > The paragraph should therefore have the following qualities: > > * It should say that the situation with XML is more complex, because the > situation in XML is more complex. No problem with that, as long as this doesn't result in exaggerations or pure hand-waving. > * It should not attempt to describe these rules, because the rules are > long, and not useful to readers of this specification. The rules aren't long. We're not talking about the *whole* set of rules, only what needs to be escaped when moving an already syntactically correct XML fragment into an attribute value. The proposal is: "Note: Due to restrictions of the XML syntax, in XML the U+003C LESS-THAN SIGN (<) needs be escaped as well. Also, XML's whitespace characters -- U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED (LF), U+000D CARRIAGE RETURN (CR) and U+0020 SPACE -- need to be escaped in order to prevent attribute-value normalization ([XML], Section 3.3.3)." (see <http://lists.w3.org/Archives/Public/public-html/2010Mar/0431.html>) > * It should defer to and reference the XML specification, because that > is the relevant specification for this issue. This is true for both the HTML and XML rules, so the conclusion would be to add specific references to both paragraphs. > DETAILS > > Change the following paragraph: > > Due to restrictions of the XML syntax, in XML a number of other > characters need to be escaped also to ensure correctness. > > ...to: > > In the XHTML syntax, a number of other characters also need to be > escaped, as defined by the XML specification. [XML] That would be much more helpful if you cited the actual section containing the answer. >... Best regards, Julian
Received on Sunday, 6 June 2010 09:18:55 UTC