W3C home > Mailing lists > Public > public-qa-dev@w3.org > April 2011

Re: HTML entities and the validator...

From: <sierkb@gmx.de>
Date: Sun, 24 Apr 2011 14:48:51 +0200
Cc: "public-qa-dev Dev" <public-qa-dev@w3.org>, "www-validator Community" <www-validator@w3.org>
Message-Id: <1CDF2AE9-5C34-48A0-B33D-278B04057DF5@gmx.de>
To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
Am 24.04.2011 um 06:45 schrieb Jukka K. Korpela:
> 
> You asked whether it is allowed and valid to shorten an entity to "&;". Similarly, it is not allowed to shorten the entity (more correctly, entity reference) "&amp;" to ";" or "amp" or "a". This does not imply that any of those strings would not be permitted.

OK.

> Correct, as there is nothing in SGML rules that would disallow it. The statement in HTML 4.01 spec that says that authors should use "&amp;" instead of "&" in text and in attribute values is not part of the formalized syntax that determines what is valid. (And it is not even a prose requirement, just a recommendation; "should", not "shall".)

OK.

> Validity does not depend on MIME types. If you have an XHTML document, then its validity is decided on by XML rules and the document type definition, without any SGML rules stepping in. Serving an XHTML 1.0 document as text/html may well make browsers process it as if i were legacy tag-soup HTML, but that's an entirely different thing.

OK.

> I'm not sure whether it helps to repeat the answers, but "&;" is valid HTML 4.01 (not a "notation" really, just two characters, invalid in XHTML 1.0 (because "&" is only allowed as beginning an entity reference or a character reference), and presumably "valid" in HTML5 but mistakenly rejected by the experimental HTML5 checker built into W3C Markup Validator. Regarding HTML5, I'm not sure about the status, as I last checked it yesterday. And as there is no formalized description of HTML5 syntax, there is no concept of "valid" in the same sense as with SGML and XML.

OK.

> Regarding SGML and XML validation, it is a bug in a validator if it does not report a markup error that violates the syntax (general SGML/XML rules or DTD rules). There is no such thing as "tolerating" such errors.

OK. And what essence and lesson should I now take from this all and tell my customer? Tell him that when he wrote "&;" as part of his URL instead of "&amp;", he did no mistake by writing such a character string construct, but should not do so? Why should he not do so and instead should better write "&amp;", while the validator is saying to him "valid"?
Received on Sunday, 24 April 2011 12:49:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 24 April 2011 12:49:31 GMT