Re: [VE][66] Add Subject Here from Jukka K. Korpela on 2007-05-16 (www-validator@w3.org from May 2007)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Wed, 16 May 2007 08:03:00 +0300 (EEST)
To: Muharrem Kaderli <m.rem@isnet.net.tr>
cc: www-validator@w3.org
Message-ID: <Pine.GSO.4.64.0705160735310.23915@hopeatilhi.cs.tut.fi>

On Tue, 15 May 2007, Muharrem Kaderli wrote:

> Validating http://www.habune.com/
> Error [66]: "document type does not allow element X here; assuming 
> missing Y start-tag"

First of all, I don't get such a message when I try to validate the 
document. Instead, I get a message saying that the document cannot be 
checked because it contains bytes that cannot be interpreted as UTF-8.

The document is in fact windows-1254 (Windows Turkish) encoding. The 
encoding should be declared in HTTP headers, or in the XML prologue,
<?xml encoding="windows-1254"?>
at the very start of the document, or both. Things get somewhat tricky, 
since an XML declaration throws IE into "Quirks Mode", and sometimes 
individual authors cannot control HTTP headers. So authors often resort to 
"meta Ersatz", i.e. a meta tag inside the document to specify the 
encoding. Although this does not comply with the specifications (XML rules 
specify the default and allow it to be overriden in the XML prologue or at 
a higher-level protocol such as HTTP, but not inside the document), it has 
been observed to "work" on contemporary browsers. But then you need to 
have the meta tag syntax right. You now have

<Meta http-equiv="Content-type" Content="text/html;" charset="windows-1254">

with two extra quotation marks; it should be

<meta http-equiv="Content-type" content="text/html;charset=windows-1254">

If I manually set, in the validator's user interface, the encoding that 
the validator uses to interpret the document (as you have probably done, 
judging from the error message you mention), I get 229 error messages.
The first error message is:

"Error  Line 4 column 6: document type does not allow element "title" 
here; assuming missing "head" start-tag."

I think that's rather self-explanatory: the tag <head> is missing before 
the <title> tag. The reason is that in XHTML, the tag <head> is 
obligatory, i.e. it cannot be omitted as in previous versions of HTML.

Then there is a pile of error messages, largely caused by the use of 
"classic HTML" syntax - uppercase letters in tag names etc. - in a 
document purported to be XHTML 1.0.

Obviously, if you change the document type declaration to an HTML 4.01 
doctype,
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
           "http://www.w3.org/TR/html4/strict.dtd">
there will be much fewer problems to consider. (You'll still have to fix
"&" to "&amp;" in many occasions, and things like that.)

There is no gain from using XHTML on the web now or in the foreseeable 
future, unless you have some special use case where you combine XHTML with 
other XML based languages and can deal with the fact that IE does not 
understand XHTML at all (except when you make it treat XHTML as "classic 
HTML"). But as you have seen, there are many traps and pitfalls that you 
will find if you try to use XHTML.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Wednesday, 16 May 2007 05:03:14 UTC