W3C home > Mailing lists > Public > www-validator@w3.org > June 2003

Re: encoding: not always required

From: Neil Zanella <nzanella@cs.mun.ca>
Date: Thu, 26 Jun 2003 23:10:43 -0230 (NDT)
To: Bjoern Hoehrmann <derhoermi@gmx.net>
cc: www-validator@w3.org
Message-ID: <Pine.LNX.4.44.0306262238590.1671-100000@garfield.cs.mun.ca>


OK, I have left out an important bit of information in my bug report:

Consider the following file:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title></title>
  </head>
  <body>
    <p>...</p>
  </body>
</html>

If you name it hello.xml and run it through the validator than everything
works fine. However, if I name it hello.html then the validator complains.
I am not sure why this is so and would like an explanation. I think it has
something to do with the fact that web servers return the type of document
in the HTTP response header before sending it. The web server I used was
configured to return the text/html mime type for files with the .html 
extension and text/xml for files with the .xml extension. But the
parser still recognized the file as XML since it stated:

--- begin quote ----------------------------------------------------------
 I was not able to extract a character encoding labeling from any of the 
valid sources for such information. Without encoding information it is 
impossible to validate the document. The sources I tried are:

    * The HTTP Content-Type field.
    * The XML Declaration.
    * The HTML "META" element.

And I even tried to autodetect it using the algorithm defined in Appendix 
F of the XML 1.0 Recommendation.
--- end quote ------------------------------------------------------------

I wonder whether XHTML documents should not be ended with the .html
extension (or the web server cannot tell them apart from text/html).
So what is the common convention? Should they have the .xml extension?

Thanks!

Neil


In any case, from the 

On Thu, 26 Jun 2003, Bjoern Hoehrmann wrote:

> * Neil Zanella wrote:
> >I have a document encoded in ASCII (a subset of UTF-8).
> >The XML 1.0 specification states:
> >
> >It is also a fatal error if an XML entity contains no encoding declaration 
> >and its content is not legal UTF-8 or UTF-16.
> >
> >However, the validator should therefore validate correctly XHTML documents
> >starting with <?xml version="1.0"?> followed by a proper XHTML 1.0 DTD
> >followed by the actual content.
> >
> >However the validator.w3.org program insisted that I ought to specify it,
> >but that's not what the XML standard says, right?
> 
> http://validator.w3.org/check?uri=http://www.bjoernsworld.de/temp/foo3.xml
> 
Received on Thursday, 26 June 2003 21:40:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:09 GMT