- From: Andrew Daviel <andrew@andrew.triumf.ca>
- Date: Fri, 8 Sep 2000 10:09:33 -0700 (PDT)
- To: Robots list <robots@MCCMEDIA.COM>
- cc: WWW-HTML List <www-html@w3.org>
Richard Chuang recently mentioned that he is working on II8N, at least with Big5. I often get asked questions about META tags in HTML (e.g. "keywords") and sometimes about II8N aspects. I was wondering what the current and future realities are of "how do I get listed in search engines" for non-English users. AFAIK, the following applies. Please correct me if I'm wrong. - The default HTML charset is ISO-8859-1 (Western European) - The Internet and most computers are 8-bit safe - One can expect to put "château" in a document, and <meta name="keywords" content="château"> (or maybe <meta name="keywords" lang="fr" content="château">) and a search engine will find it under "château". (if anyone doesn't see the accent, I wrote "ch<a-circumflex>teau") - Many search engines will lose the accent so that a search for "chateau" will also find it. - Escaped characters such as é and é should be translated to the 8-bit value é and should also match, so that "renée" entered in a search engine should find "renée" in a document. Things I am not so sure about: - what happens when the document charset is not ISO-8859-1 but ISO-8859-5 or KOI8-R or Windows-1251 ? Does the search engine just try to match the 8-bit value from the users keyboard, or does it try to be clever with HTTP_ACCEPT_CHARSET and map across to the alternate charset(s) used in the documents ? As I understand, at least with Netscape, the browser will automatically switch charsets when a page is loaded if a charset modifier is used with content-type, providing that the font is available, so that in Russian, Chinese etc. pages composed with Unix, Windows etc. may all be viewed correctly. But presumeably the users keyboard is mapped to a single charset. - what happens when a user enters data in a form on a page written in Windows-1251, but their keyboard is set to ISO-8859-5 ? - are there any special rules for 16-bit charsets ? Back to the meta tag question, if a user wants to be found as "renée", should they put <meta name="keywords" content="renée"> <meta name="keywords" content="renée"> <meta name="keywords" content="rené"> <meta name="keywords" content="renee"> or all 4 ? Andrew Daviel, TRIUMF, Canada
Received on Friday, 8 September 2000 13:10:06 UTC