- From: Chris Lilley <chris@w3.org>
- Date: Mon, 23 Feb 2004 00:21:51 +0100
- To: "www-international@w3.org"@homer.w3.org
Hello www-international,
>> Files here are either HTML or extended (8-bit) ASCII. Where
>> possible, text files are tab delimited. Some files have been
>> converted into standard HTML encoding (ISO-8859-1) from Unicode.
Gasp.
>> The closest equivalent character in ISO-8859-1 was selected, and
>> any diacritics simulated using <SUB> and <SUP> and the closest
>> equivalent punctuation mark. In the case of Cyrillic, Greek and
>> Hebrew, a consistent transliteration scheme was used. The source
>> for each file contains hidden tags which specify the Unicode value
>> for each character which has no ISO-8859-1 equivalent.
There is a standard way to do that
>> To obtain these values, you can download the file or view its
>> source in your browser. The tags have the form <!u
>> XXXX>character</!u>, where XXXX is the four digit hexadecimal value
>> of the Unicode character.
Shudder. Although a perl script could probably go and reverse this
damage.
http://www.wordgumbo.com/index.htm
Interesting site, but (shakes head) why oh why!!
Although
http://www.wordgumbo.com/ie/cmp/iedata.txt
COMPARATIVE INDOEUROPEAN DATABASE COLLECTED BY ISIDORE DYEN
FILE IE-DATA1
Copyright (C) 1997 by Isidore Dyen, Joseph Kruskal, and Paul Black
This file was last modified on Feb 5, 1997
maybe that is why.
--
Chris Lilley mailto:chris@w3.org
Chair, W3C SVG Working Group
Member, W3C Technical Architecture Group
Received on Sunday, 22 February 2004 18:21:51 UTC