- From: Kevin Rodgers <kevin.rodgers@ihs.com>
- Date: Tue, 13 Jan 2004 16:10:41 -0700
- To: www-html-editor@w3.org
I would just like to add my voice to Björn Höhrmann's regarding entity references: http://lists.w3.org/Archives/Public/www-html-editor/2001OctDec/0084.html My complaint is with the following: | In both SGML and XML, the ampersand character ("&") declares the | beginning of an entity reference (e.g., ® for the registered | trademark symbol "?"). Unfortunately, many HTML user agents have | silently ignored incorrect usage of the ampersand character in HTML | documents - treating ampersands that do not look like entity | references as literal ampersands. HTML 2.0 - 4.01 are all defined as SGML applications, and the SGML spec clearly states that "&" is not recognized as an entity reference open delimiter in element content and attribute value literals unless it is immediately followed by a name start character (see ISO 8879:1986 [9.6] Delimiter Recognition and [Figure 3] Reference Delimiter Set: General). So "&" not immediately followed by a name start character should indeed be treated as a literal ampersand by HTML user agents (and "&#" not immediately followed by a name start character or a digit should be treated as a literal ampersand-octothorpe sequence). The example in [XHTML 1.0] C.12 is a URL for a CGI script invocation in which "&" is actually followed by a name start character: | http://my.site.dom/cgi-bin/myscript.pl?class=guest&name=user So that ampersand must be expressed as an entity reference as claimed. But in general that claim is not true, and this incompatibility between HTML and XHTML is due to the facts that [1] XML requires "&" to be interpreted as a markup delimiter (except within comments, processing instructions, and CDATA sections) and [2] XML adds "_" and ":" to the set of name start characters anyway. [1] http://www.w3.org/TR/REC-xml#syntax [2] http://www.w3.org/TR/REC-xml#NT-Name Thanks, -- Kevin Rodgers
Received on Tuesday, 13 January 2004 18:14:56 UTC