Re: hex FFFE at the start of UTF-16 stream

Jan Egil Kristiansen <janegil@landsbank.fo> wrote:

> http://lbk.olivant.fo/test/mini_x.html was validated OK when saved in 
> ISO-8859-1. But when I used Notepad to 'save as UNICODE', the validator 
> complains of "Missing DOCTYPE declaration at start". Could it be caused by 
> the hex FFFE ("ÿþ") inserted by Notepad to mark the file as UTF-16? 
> (http://validator.w3.org/check?uri=http%3A%2F%2Flbk.olivant.fo%2Ftest%2Fmini_x.html)
> 
> http://www.unicode.org/unicode/reports/tr6/index.html#Signature seems to 
> allow that kind of marking of the file. But maybe the HTTP server is 
> supposed to remove the signature, and replace it with a charset in the HTTP 
> header?

While the validator's script needs to be updated to handle UTF-16
correctly (nsgmls can handle UTF-16 if it is configured appropriately,
and indeed the above page is validated if I run nsgmls locally), you
have to configure your Web server to add a correct charset parameter
to the Content-Type HTTP response header, i.e.

    Content-Type: text/html; charset=UTF-16

Your server only sends

    Content-Type: text/html

and it cannot be handled correctly even if the validator can handle
UTF-16.  RFC 2616, "3.7.1 Canonicalization and Text Defaults" says:

    The "charset" parameter is used with some media types to define the
    character set (section 3.4) of the data. When no explicit charset
    parameter is provided by the sender, media subtypes of the "text"
    type are defined to have a default charset value of "ISO-8859-1" when
    received via HTTP. Data in character sets other than "ISO-8859-1" or
    its subsets MUST be labeled with an appropriate charset value. See
    section 3.4.1 for compatibility problems.

    cf. http://www.ietf.org/rfc/rfc2616.txt

Regards,
-- 
Masayasu Ishikawa / mimasa@w3.org
W3C - World Wide Web Consortium

Received on Wednesday, 4 October 2000 15:27:18 UTC