W3C home > Mailing lists > Public > public-qa-dev@w3.org > April 2011

Re: HTML entities and the validator...

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Sun, 24 Apr 2011 16:08:08 +0300
Message-ID: <E2A26F7ECE7D4E9E8C1151A4A52E804C@JukanPC>
To: <sierkb@gmx.de>
Cc: "public-qa-dev Dev" <public-qa-dev@w3.org>, "www-validator Community" <www-validator@w3.org>
sierkb@gmx.de wrote:

> And what essence and lesson should I now take from this all and
> tell my customer?

In all versions of HTML, it is _recommendable_ to write any occurrence of 
"&" in text content or in an attribute value as "&auml;". Using an "&" as 
such is formally forbidden in some versions, formally allowed but 
discouraged in prose in other versions.

> Tell him that when he wrote "&;" as part of his URL
> instead of "&amp;", he did no mistake by writing such a character
> string construct, but should not do so?

If he wanted the URL to contain the "&" character between name=value pairs, 
which is the most common scenario of ampersands in URLs, then it was a 
double mistake: using an unencoded "&" when writing HTML, and adding an 
extra ";" (which may or may not cause trouble).

> Why should he not do so and
> instead should better write "&amp;", while the validator is saying to
> him "valid"?

First, because "&;" has a different meaning (two characters). Second, 
because unencoded "&" characters cause confusion, and it is simpler to 
always encode them in text and attribute values in HTML than to learn and 
remember the rules that allow them in specific contexts in some HTML 
versions.

A markup validator proper analyzes a document for conformance to a 
formalized syntax specification, no more, no less. Being "valid" means such 
conformance, no more, no less. So it's just a minor, though relevant, aspect 
of conforming to specifications, which is just one part of being a good 
page. It's remotely comparable to a spelling checker. You wouldn't expect a 
"pass" report from a spelling checker to indicate that the text is 
well-written, would you? It would only indicate lack of detectable spelling 
errors, which is fine but does not mean that the text is grammatically 
correct, or in good style, or makes sense, or is factually correct.

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/ 
Received on Sunday, 24 April 2011 13:11:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 24 April 2011 13:11:31 GMT