W3C home > Mailing lists > Public > public-qa-dev@w3.org > April 2011

Re: HTML entities and the validator...

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Sat, 23 Apr 2011 21:28:29 +0300
Message-ID: <92950FB4CBFB4DD99CE51ECC079ACDA7@JukanPC>
To: "public-qa-dev Dev" <public-qa-dev@w3.org>
Cc: "www-validator Community" <www-validator@w3.org>
sierkb@gmx.de wrote:

> Question: is there, by any means, anywhere, a definition, if in HTML
> (concrete: HTML 4.01) and/or it's parent, SGML, it's allowed and
> valid to shorten an entity (like &amp;) to "&;"

It is not.

> in a given <a href="URL">

The context does not matter.

> so that the W3C Markup Validator is right, in NOT
> labeling it as an error and let passing it as valid?

Non sequitur. "&;" is not a shortened notation of an entity. Whether it is 
valid is a different question-

> When NOT valid, why does
> the W3C Markup Validator say so, while parsing/validating against
> HTML 4.01 Strict

In HTML 4.01, by the formal specifications, SGML rules apply, so an "&" 
character simply denotes itself when it is not followed by a NAME character, 
and ";" is not a NAME character.

>  -- and in contrast (and my expectation) throws an
> (from my point of view expectable and correct) error, when
> parsing/validating against XHTML 1.0 Strict

In XHTML, XML rules apply, and XML never allows an "&" character except as 
the initial character of a character reference or an entity reference.

> while sticking to the
> Mimetype text/html)?

The MIME type does not matter here.

> Validating such an URL against HTML5 also throws an error message.

HTML5 plays by its own rules, and HTML5 itself is work in progress, and its 
processing by the W3C Markup Validator is experimental and does not 
necessarily reflect the current status of HTML5 is all respects.

The HTML5 rules more or less reflect the SGML rules. You can see this if you 
try "& ;" (i.e., ampersand, space, semicolon) - it passes. The W3C Markup 
validator rejects "&;" in HTML5 mode for some reason that I cannot figure 
out, as I can find no prohibition against it. Perhaps the software reflects 
some older version of HTML5, perhaps it's just an effect of the underlying 
software basis.

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/ 
Received on Saturday, 23 April 2011 18:30:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 23 April 2011 18:30:49 GMT