- From: Chris Lilley <chris@w3.org>
- Date: Mon, 23 Feb 2004 00:21:51 +0100
- To: "www-international@w3.org"@homer.w3.org
Hello www-international, >> Files here are either HTML or extended (8-bit) ASCII. Where >> possible, text files are tab delimited. Some files have been >> converted into standard HTML encoding (ISO-8859-1) from Unicode. Gasp. >> The closest equivalent character in ISO-8859-1 was selected, and >> any diacritics simulated using <SUB> and <SUP> and the closest >> equivalent punctuation mark. In the case of Cyrillic, Greek and >> Hebrew, a consistent transliteration scheme was used. The source >> for each file contains hidden tags which specify the Unicode value >> for each character which has no ISO-8859-1 equivalent. There is a standard way to do that >> To obtain these values, you can download the file or view its >> source in your browser. The tags have the form <!u >> XXXX>character</!u>, where XXXX is the four digit hexadecimal value >> of the Unicode character. Shudder. Although a perl script could probably go and reverse this damage. http://www.wordgumbo.com/index.htm Interesting site, but (shakes head) why oh why!! Although http://www.wordgumbo.com/ie/cmp/iedata.txt COMPARATIVE INDOEUROPEAN DATABASE COLLECTED BY ISIDORE DYEN FILE IE-DATA1 Copyright (C) 1997 by Isidore Dyen, Joseph Kruskal, and Paul Black This file was last modified on Feb 5, 1997 maybe that is why. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group
Received on Sunday, 22 February 2004 18:21:51 UTC