- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Sun, 24 Apr 2011 07:45:07 +0300
- To: "public-qa-dev Dev" <public-qa-dev@w3.org>
- Cc: "www-validator Community" <www-validator@w3.org>
sierkb@gmx.de wrote: > Am 23.04.2011 um 20:28 schrieb Jukka K. Korpela: >> sierkb@gmx.de wrote: >> >>> Question: is there, by any means, anywhere, a definition, if in HTML >>> (concrete: HTML 4.01) and/or it's parent, SGML, it's allowed and >>> valid to shorten an entity (like &) to "&;" >> >> It is not. > > That's what is the essence of my question. If it's not allowed and > not valid, why does the W3C validator let pass it and says "valid"? You asked whether it is allowed and valid to shorten an entity to "&;". Similarly, it is not allowed to shorten the entity (more correctly, entity reference) "&" to ";" or "amp" or "a". This does not imply that any of those strings would not be permitted. >> In HTML 4.01, by the formal specifications, SGML rules apply, > > Yes. That' clear and not in question. > >> so an "&" character simply denotes itself when it is not followed by >> a NAME character, and ";" is not a NAME character. > > Again my question: is the validator correct or wrong in letting pass > such a "&;" construct concerning HTML 4.01? Correct, as there is nothing in SGML rules that would disallow it. The statement in HTML 4.01 spec that says that authors should use "&" instead of "&" in text and in attribute values is not part of the formalized syntax that determines what is valid. (And it is not even a prose requirement, just a recommendation; "should", not "shall".) > Yes. If parsed as XML by the XML parser and not as SGML by the SGML > parser, affected by the Mime type which has the role as a switch. Am > I right? Validity does not depend on MIME types. If you have an XHTML document, then its validity is decided on by XML rules and the document type definition, without any SGML rules stepping in. Serving an XHTML 1.0 document as text/html may well make browsers process it as if i were legacy tag-soup HTML, but that's an entirely different thing. >> The W3C Markup validator rejects "&;" in HTML5 mode for some reason >> that I cannot figure out, as I can find no prohibition against it. > > So, "&;" is a valid notation in HTML 4.01, XHTML 1.0 and HTML 5? Or > not valid? Or is it not valid but tolerated? If valid, then why does > the validator differ in the results? And if not valid, why does it > either differ in the results? I'm not sure whether it helps to repeat the answers, but "&;" is valid HTML 4.01 (not a "notation" really, just two characters, invalid in XHTML 1.0 (because "&" is only allowed as beginning an entity reference or a character reference), and presumably "valid" in HTML5 but mistakenly rejected by the experimental HTML5 checker built into W3C Markup Validator. Regarding HTML5, I'm not sure about the status, as I last checked it yesterday. And as there is no formalized description of HTML5 syntax, there is no concept of "valid" in the same sense as with SGML and XML. > _Is_ it a bug of the validator to present different results in > handling "&;" (ampersand, semicolon), when validating against HTML > 4.1, XHTML 1.0 or HTML5? Why would it be? The specifications differ. > Or is it not valid and no bug of the validator but tolerated > by the validator? That's the main question. That's an odd question. Regarding SGML and XML validation, it is a bug in a validator if it does not report a markup error that violates the syntax (general SGML/XML rules or DTD rules). There is no such thing as "tolerating" such errors. -- Yucca, http://www.cs.tut.fi/~jkorpela/
Received on Sunday, 24 April 2011 04:45:42 UTC