W3C home > Mailing lists > Public > www-validator@w3.org > April 2013

Re: Ampersand [was: Re: [VE] [325] Add Subject Here

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Thu, 18 Apr 2013 16:17:15 +0300
Message-ID: <516FF25B.40705@cs.tut.fi>
To: www-validator@w3.org
2013-04-18 14:51, Fred H Olson wrote:

> On 17 Apr 2013 Jukka K. Korpela <jkorpela@cs.tut.fi> wrote:
>> Alternatively, ignore the messages.
>> The use of "&", when followed by a name, which is not followed by a
>> semicolon, is a violation of HTML 4.01 rules. But it does not cause any
>> actual harm. Browsers and search engines take the "&" as a data
>> character, not as starting an entity reference, provided that the name
>> is not an entity name as per HTML 4.01. This error recovery has been
>> formally described and made mandatory in the HTML5 CR.
>> Yucca
> So should non entity uses of "&" (e.g. " Sam & Mary" ) be replaced
> with "&amp;" ?

By HTML 4.01 rules, they need not be replaced when followed by a space 
or other non-name character. The HTML 4.01 spec might be seen as 
recommending that "&" be always encoded, but this is not a validity 
constraint or otherwise a conformance requirement.

If it were "Sam&Marry", things would be different: for validity, "&" 
must be escaped here (but browsers don't actually require that).

In XHTML, as well as in XML in general, "&" as a data character must 
always be escaped - otherwise the document isn't even well-formed.

> Are there performance penalties if they are left in?


> If it becomes common to just leave them, should the validator
> and the option to be configured to not report them?

An SGML or XML validator is required to report all markup errors that 
are reportable according to the SGML standard or the XML specification. 
But this might be interpreted liberally so that it may suppress some of 
them if so required by the user.

Anyway, I think nobody is working on that part of the W3C Markup 
Validator. Development work is directed towards the HTML5 linter, called 
"HTML5 validator", which is what you actually use when you use the W3C 
Markup Validator in HTML5 mode. Currently, in that mode, Sam&Marry is 
reported as an error, even though it is allowed in HTML5; this has been 
fixed in a development version of the validator
(which also has an option of filtering out messages by type - but it 
won't be needed in this case).

Received on Thursday, 18 April 2013 13:17:50 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:18:08 UTC