Re: UTF-8 with Unicode line separator and BOM from Terje Bless on 2000-10-28 (www-validator@w3.org from October 2000)

From: Terje Bless <link@tss.no>
Date: Sat, 28 Oct 2000 07:49:42 +0200
To: W3C Validator <www-validator@w3.org>
Message-ID: <20001031114547-r01010600-b510f1ea@10.0.0.2>

On 23.10.00 at 13:41, Masayasu Ishikawa <mimasa@w3.org> wrote:

>We are planning to enhance support for various character encodings,
>by converting them to UTF-8 before validation.  Similarly, BOM in
>UTF-8 could be removed before validation so that SP won't be barfing
>on it.

Hmmm, I spy, with my little eye, something beginning with a Slippery Slope!
I'd be very weary of modifying the input before it's fed to the SGML
Parser. As a temporary workaround to a problem with the parser, sure, but
not to rely on for normal operation. If you alter the charset you are no
longer validating what's on the actual server but rather "somthing" that
only has meaning within the Validator. The BOM is the same principle, but
easier to do right in practice (I /think/).

It's sorta like running HTML through Tidy _before_ SP and then report "All
is Well" to the user. :-)

-- 
As a cat owner, I know this for a fact...
Nothing says "I love you" like a decapitated gopher on your front porch.

Received on Tuesday, 31 October 2000 05:45:54 UTC