- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 09 Feb 2005 14:59:19 +0900
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: www-i18n-comments@w3.org, w3c-i18n-ig@w3.org (I18N IG, for archiving only), member-i18n-core@w3.org, Chris Lilley <chris@w3.org>
Hello Bjoern, At 09:48 05/02/09, Bjoern Hoehrmann wrote: >* Martin Duerst wrote: >>It is just the mention of iso-8859-1 that is crucial in this context, >>as it was most often misused. People put up a page in an arbitrary >>8-bit encoding, labeled it as iso-8859-1, and constructed a font that >>made things look right. So using iso-8859-1 was explicitly part of >>the misuse, and trying to avoid mentioning it just obscures the issue. > >Maybe you can cite an example web page and a freely available font that >demonstrates the misuse you have in mind? These days, these examples are fortunately getting rarer, and aren't advertised as much anymore, so it's difficult to find them, but here is an old page that explains this: http://www.fedu.uec.ac.jp/ThaiMac/thaibrowser.html These days, pages explaining things correctly, and doing things correctly, abound, so it's very difficult to find pages that don't. But believe me, there was a lot of this stuff around. I even once had to diagnose a file I got from a colleague, it turned out to be Thai text interpreted as iso-8859-1 and from there converted to UTF-8. I have cc'ed Chris, maybe he can point us to more examples. >Do you mean that it matters >that the web page is encoded using ISO-8859-1? In theory, it doesn't. In practice, that was the encoding available on all browsers, so that's what everybody misused. >That would be weird as >HTML/XHTML require that text processing happens essentially independend >of the character encoding. The whole thing is weird. That's why we are prohibiting it :-). >So, as far as I understand the comment in >the current document, it refers to a font that is defined in terms of >ISO-8859-1; maybe you can cite font technology that enables such mis- >use? It's very easy. You take a font editor, take a font made to cover the repertoire encoded by iso-8859-1, and change the accented Latin characters and so on to something else, e.g. Thai. Any font technology enables such misuse, Font technology has no way to check whether the glyph e.g. for codepoint U+00F6 is what people might expect in that font for an o-Umlaut or not. >What I do not understand so far is why a character encoding is of >any significance in this context. As I said, theoretically, any character encoding would do, but in practice, it was iso-8859-1 that got misused. >>If you have any ideas of how to express things with mentioning >>iso-8859-1 (and again, not being overly complicated), that would >>be appreciated. > >Well, to me the current text does not make any sense, so I can't really >make a suggestion that involves ISO-8859-1. The conformance requirement >now only discusses code points and coded character sets, not character >encodings, so the requirement and the mention of ISO-8859-1 seem quite >orthogonal to each other. In theory, yes, they are orthogonal. But in practice (mostly past practice, fortunately), iso-8859-1 is very relevant, and it's much easier for somebody who knew these kinds of misused to recognise what we are talking about in the way it's described now. You seem to be fortunate to have come to Web internationalization at a time when such misuses were already less frequent. Maybe we could change "This prohibits the construction of fonts that misuse e.g. iso-8859-1 to represent different scripts, characters, or symbols than what is actually encoded in iso-8859-1." to something like "This prohibits the formerly frequent construction of fonts that misused e.g. iso-8859-1 to represent different scripts, characters, or symbols than what was actually encoded in iso-8859-1." Would that help? Regards, Martin.
Received on Wednesday, 9 February 2005 05:59:56 UTC