- From: Masayasu Ishikawa <mimasa@w3.org>
- Date: Thu, 05 Oct 2000 04:27:11 +0900
- To: janegil@landsbank.fo
- Cc: www-validator@w3.org
Jan Egil Kristiansen <janegil@landsbank.fo> wrote: > http://lbk.olivant.fo/test/mini_x.html was validated OK when saved in > ISO-8859-1. But when I used Notepad to 'save as UNICODE', the validator > complains of "Missing DOCTYPE declaration at start". Could it be caused by > the hex FFFE ("ÿþ") inserted by Notepad to mark the file as UTF-16? > (http://validator.w3.org/check?uri=http%3A%2F%2Flbk.olivant.fo%2Ftest%2Fmini_x.html) > > http://www.unicode.org/unicode/reports/tr6/index.html#Signature seems to > allow that kind of marking of the file. But maybe the HTTP server is > supposed to remove the signature, and replace it with a charset in the HTTP > header? While the validator's script needs to be updated to handle UTF-16 correctly (nsgmls can handle UTF-16 if it is configured appropriately, and indeed the above page is validated if I run nsgmls locally), you have to configure your Web server to add a correct charset parameter to the Content-Type HTTP response header, i.e. Content-Type: text/html; charset=UTF-16 Your server only sends Content-Type: text/html and it cannot be handled correctly even if the validator can handle UTF-16. RFC 2616, "3.7.1 Canonicalization and Text Defaults" says: The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. See section 3.4.1 for compatibility problems. cf. http://www.ietf.org/rfc/rfc2616.txt Regards, -- Masayasu Ishikawa / mimasa@w3.org W3C - World Wide Web Consortium
Received on Wednesday, 4 October 2000 15:27:18 UTC