Martin Duerst <> wrote:

>I was quite surprised to see that the main validator page (and probably
>others) are still iso-8859-1 (see e.g.
>2Fvalidator. This leads to problems when validating IRIs
>and, once we support that, IDNs.
>All validator output is UTF-8, and there is no reason to have the input
>be something else.

Well, that's a good point; but the main reason the pages are still iso-8859-1
is that older browsers had trouble with UTF-8. This was also the reason for
nuking some "fancy quotes" scattered around the site.

What I've been thinking is that 0.6.x should keep compatibility with these
browsers -- which is Netscape 3.x, _not_ 4.x, BTW :-) -- and then revisit the
issue for 0.7. Most likely -- unless I'm persuaded otherwise again -- 0.7 will
be all UTF-8 and freely making use of "Unicode" features (such as the
mentioned typographical quote marks).

Part of the reason for this is that around 0.7 we need to take a long hard
look at what backwards compatibility we want to invest resources in
supporting; as well as what width of platforms to support. e.g. how old
"standard" linux distros do we want to support, and whether or not we want to
support Win32; in which case we need to actually resolve all the issues with
that platform. Win32 in particular is difficult since we don't have a working
SGML/XML Parser on that platform. Björn has made quite a bit of progress on
that, but I don't know how much time/opportunity he has to keep working at it
(and it's a bear of task too).

( BTW, Björn, I can probably arrange for access to a Win32 box with the Visual
Foo tools for you if that would help. )

Received on Friday, 4 July 2003 15:03:15 UTC