- From: Masayasu Ishikawa <mimasa@w3.org>
- Date: Thu, 19 Aug 1999 00:52:46 +0900
- To: mrc@ChipChat.com
- Cc: www-validator@w3.org
Marty Cawthon <mrc@ChipChat.com> wrote: > When the XHTML validator checks a document (a 'forum' page) > that I am working on it reports error messages for some Japanese text: > "non SGML character 130". (snip) > It may be that the document does contain non SGML characters, in which case > I will appreciate a pointer to learn more to help me make the characters so that > they conform to XHTML. Or it may be a bug in the validator when examining > documents containing Japanese text. That's not a bug, that's because you didn't send correct charset information. To validate Japanese documents correctly, you MUST explicitly specify character encoding of your documents. In this case, the server only sends Content-Type: text/html for <http://www.koga.org/letters.htm>, without charset parameter, so the validator assumes that the character encoding of the document is ISO-8859-1, according to HTTP/1.1 spec. Section 3.7.1 of HTTP/1.1 spec [1] says: The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. See section 3.4.1 for compatibility problems. The document is actually encoded in Shift_JIS, so the validator generated some strange error messages. If the server sends Content-Type: text/html; charset=Shift_JIS then the validator works fine. Though you should specify charset information via HTTP Content-Type header as described above, the validator also recognizes equivalent information inside the document, namely, meta element. But in this case, this is also wrong. The document includes the following line: <meta http-equiv="Content-Type" content="text/html; charset=SJIS-JP" /> but "SJIS-JP" is not the registered charset name. "Shift_JIS" is the corrent name. Check the charset registory [2] for more detail. And also, since an XHTML document is an XML document, you MUST also include the the following XML declaration at the beginning of your document. <?xml version="1.0" encoding="Shift_JIS" ?> Hope this helps. [1] http://www.ietf.org/rfc/rfc2616.txt [2] ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets Regards, -- Masayasu Ishikawa / mimasa@w3.org W3C - World Wide Web Consortium
Received on Wednesday, 18 August 1999 11:53:04 UTC