RE: no encoding specified when creating XHTML documents

Dave J Woolley wrote:
> 	[DJW:]  I said served as text/html, which implies
> 	the use of an HTTP server and text/html content-type.

I don't see that in your message.  You wrote things such as "compatible with
the HTML default character set" and "For HTML compatibility, I would
therefore say that Amaya should be defaulting ISO 8859/1" and "Amaya is an
[X]HTML tool, it should default to 8859/1". I felt it was important to
debunk this purported HTML default charset, which doesn't exist.

> 	[DJW:]  That's not the same thing.  No encoding means
> 	UTF-8 or UTF-16.

If you mean no encoding *declaration* in XML, then yes (unless there is
external information, such as an HTTP header, that says otherwise).

>     Amaya should store documents as
> 	ISO 8859/1 unless told otherwise, as that is what most
> 	users will expect;

Well, that's what it does, no?  Apart from the fact that this is not a very
good idea (a better default would be something based the platform's
locale/charset, still better a user preference), the problem is that Amaya
currently saves XHTML with inconsistent encoding declarations, unless you
set a language.  If I save a new document, I now see:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>No title</title>
  <meta name="GENERATOR" content="amaya V3.2.1" />
  <meta http-equiv="Content-Type" content="text/html" />
</head>

The XML declaration, lacking an encoding decl., implies that the encoding is
UTF-8 while the document is actually saved in 8859-1.  The <meta> comes too
late in the <head> (it should come before <title>) and anyway is totally
useless; if I can parse that far, I already know it's text/html!  As Vincent
indicated, only if I set a language do I get correct declarations.

> 	What apparently is happening is that it is defaulting to
> 	UTF-8, although I suspect it only works correctly if the
> 	display also uses UTF-8.

I don't see that, how do get this behaviour?  What I see is the doc stored
in 8859-1 but the XML declaration lying about it.  If I set a language att
on <html>, the doc is still stored in 8859-1 but the declarations now
correctly say so.

--
François Yergeau

Received on Thursday, 24 August 2000 13:01:15 UTC