an odd question from cj@mb-soft.com on 2015-03-20 (www-validator@w3.org from March 2015)

From: <cj@mb-soft.com>
Date: Fri, 20 Mar 2015 11:00:30 -0500
To: <www-validator@w3.org>
Message-ID: <B165F4F883D7434DA2AF34A192CD2CE0@D9CDNW91>

I have an odd question. The Validator seems to have what might be a flaw. I operate a large web-site with thousands of web-pages. Nearly all are either UTF-8 or Windows format. Validator seems to get confused if it confronts even one byte of the wrong format, where it shuts down. It seemed to me that a simple solution would be for Validator to "skip" that byte, or even consider trying to translate it into the other of those two formats, rather than totally abandoning the effort.

I had thought I had a solution for my situation, since my web-pages are virtually identical in UTF-8 and Windows and Western, where only an occasional byte (sych as a Spanish tilden character, exists. I thought I had solved the problem by using the &#176 type coding, which should be compatible with either UTF-8 or Windows.

But I now get the impression that Search Engines get all fouled up. For a word like Deja vu, I now have three different available spellings, English, UTF-8, Windows and &# format, and Search Engines seem to (sometimes) treat the four "spellings" as different. I get different traffic reports for the identical page with those different spellings.

What is the best solution to this?

Also, I realize that Search Engines don't like "duplicate web-pages", so my "solution" probably needs to choose the "best of the four pages to have on the Internet. Which one?

Pastor Carl
A Christ Walk Church
BELIEVE Religious Information Source web-site
http://mb-soft.com/believe/indexaz.html

Received on Friday, 20 March 2015 16:17:44 UTC