[XHTML 1.0] C.12 Using Ampersands in Attribute Values (and Elsewhere)

I would just like to add my voice to Björn Höhrmann's regarding entity
references:

http://lists.w3.org/Archives/Public/www-html-editor/2001OctDec/0084.html

My complaint is with the following:

| In both SGML and XML, the ampersand character ("&") declares the
| beginning of an entity reference (e.g., ® for the registered
| trademark symbol "?").  Unfortunately, many HTML user agents have
| silently ignored incorrect usage of the ampersand character in HTML
| documents - treating ampersands that do not look like entity
| references as literal ampersands.

HTML 2.0 - 4.01 are all defined as SGML applications, and the SGML spec
clearly states that "&" is not recognized as an entity reference open
delimiter in element content and attribute value literals unless it is
immediately followed by a name start character (see ISO 8879:1986 [9.6]
Delimiter Recognition and [Figure 3] Reference Delimiter Set: General).

So "&" not immediately followed by a name start character should indeed
be treated as a literal ampersand by HTML user agents (and "&#" not
immediately followed by a name start character or a digit should be
treated as a literal ampersand-octothorpe sequence).

The example in [XHTML 1.0] C.12 is a URL for a CGI script invocation in
which "&" is actually followed by a name start character:

| http://my.site.dom/cgi-bin/myscript.pl?class=guest&name=user

So that ampersand must be expressed as an entity reference as claimed.
But in general that claim is not true, and this incompatibility between
HTML and XHTML is due to the facts that [1] XML requires "&" to be
interpreted as a markup delimiter (except within comments, processing
instructions, and CDATA sections) and [2] XML adds "_" and ":" to the
set of name start characters anyway.

[1] http://www.w3.org/TR/REC-xml#syntax
[2] http://www.w3.org/TR/REC-xml#NT-Name

Thanks,
-- 
Kevin Rodgers

Received on Tuesday, 13 January 2004 18:14:56 UTC