Re: some things that w3c validator could warn/complain about (mostly XHTML interoperability issues)

olivier Thereaux wrote:

> On 8 nov. 07, at 00:02, Christian Steinert wrote:
>> - empty class="" attribute does not seem to be allowed, but is not
>> reported by the validator
>
> Empty class attribute wouldn't make sense, but I don't see any clear
> mention, either in the prose or in the machine-readable schemas that
> would make that forbidden.

The declared content for the class attribute is CDATA, so a validator 
must accept class="" as valid. It is however meaningless, so a warning 
could be issued, if the document is treated as some version of HTML. The 
HTML 4.01 specification defines that the attribute "assigns a class name 
or set of class names to an element", so class="" does not assign 
anything or mean anything else either, and it cannot even be used to 
override any other assignment.

On the other hand, should a markup validator consider such issues? This 
is a general problem that needs a general solution, rather than the 
addition of miscellaneous ad hoc checks. In fact, if a validator warns 
about _some_ semantically questionable constructs, people will start to 
wonder why it does not complain anything about quite comparable or more 
serious problems. (This also applies to constructs that are valid but 
violate syntactic requirements presented in prose, such as <td 
width="foo bar"> or <form action="I ain't no URL, really">.)

I would also be odd that all those warnings would disappear if the 
document type declaration is replaced by one that differs from the one 
specified in HTML specifications but refers to a DTD that is identical 
to (or very similar to) some HTML DTD.

>> - HTML-comments inside of CSS and script tags are legal XHTML, but
>> will cause content to be ignored by XHTML-conformant browsers
>
> I don't really see what's the problem with that. Could you give
> details or examples?

There is widespread superstition, supported by many HTML guides and 
tutorials, saying that style sheets and scripts should be placed inside 
<!-- ... --> to hide them from older browsers, e.g.
<style type="text/css"><!--
   CSS rules go here
--></style>
and _many_ authors still routinely use this approach.

This had some meaning long ago but became nonsense in the late 1990s. 
Note that in HTML 4.01, they are not comments. The content model is 
CDATA, so no comment conventions apply. (The original idea was that user 
agents that do not recognize script or style elements at all were 
assumed to ignore the <script> and <style> tags and the corresponding 
end tags and to process the content between them, so this would result 
in turning it into visible document content, unless "protected" by 
something that would be treated as comment delimiters by such a 
browser.)

The problem with this is that in XHTML, the content model is #PCDATA, 
and the <!--...--> construct _is_ a comment, and by XML rules, user 
agents are _allowed_ to remove all comments before otherwise processing 
the document. Thus, the entire style sheet or script code inside the 
element _may_ be ignored.

So that's the problem. It's less obvious how to approach it. Should it 
be handled by ad hoc processing of script and style elements? Just in 
XHTML, where disaster may result, or also in HTML, where the 
superstition is just pointless?

Perhaps the strategy could be: _if_ in XML parsing mode _and_ the 
document has been recognized as an XHTML document, _then_ issue a 
warning about any XML comment inside a script or style element. This 
could be misleading if the author intentionally used a comment there, 
_intending_ it as a mere comment and _expecting_ it to be ignored by the 
software he uses, but this is unlikely, and it would probably be so 
unwise to _rely_ on ignoring the comment that a warning is actually 
useful here, too.

The warning could be something like
"An XML comment was detected inside a script or style element in a 
document expected to be XHTML. Beware that such comments may be removed 
by programs that process the document. Thus, the old habit of putting a 
style sheet or script code inside a comment-looking construct, to deal 
with old web browsers, has become very risky and should not be used in 
XHTML."
(of course, divided into some short message and a longer explanation).

Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/ 

Received on Wednesday, 5 December 2007 08:14:32 UTC