Re: Document without charset

Hi Andreas!

At 15:12 09.12.2005, you wrote:

>On Fri, 9 Dec 2005, Bjoern Hoehrmann wrote:
>
> >>  First, the validator *itself* chooses UTF-8.
> >>  Then the validator declares that UTF-8 is impossible.
> >
> > If your point is that the Validator should simply refuse to validate
> > documents for which the character encoding cannot be determined simply
> > by following the requirements of the applicable specifications then
> > this is not really an option.
>
>No, I'm not suggesting this.
>But *if* you guess an encoding *then* you should not (a few lines
>later) declare that this guessed encoding is impossible.

The validator doesn't declare, that this this guessed encoding is impossible.
I quote:
"Without encoding information it is impossible to reliably validate the 
document. I'm falling back to the "UTF-8" encoding and will attempt to 
perform the validation, but this is likely to fail for all non-trivial 
documents. "

This sentence simply means, that validation without any encoding 
information at all, is not reliable.
The validator needs an information of what the document's encoding is.

To avoid that situation, you should add either the XML-Deklaration (for 
XML documents this is mandatory, because this is the only place in the 
document, where you can store encoding information for the whole 
document), e.g.
<?xml version="1.0" encoding="UTF-8"?>

and/or, if you deliver your Document as HTML (text/html) and not  as 
XML (e.g. application/xhtml+xml),
you have to add a Meta information to your document, which holds the 
encoding information, e.g.
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>



Sierk Bornemann

Sierk Bornemann | Hannover | Germany
e-mail:  sierkb@gmx.de
URL:     http://sierkbornemann.de/ 

Received on Friday, 9 December 2005 14:43:45 UTC