Read it and weep

Hello www-international,

>> Files here are either HTML or extended (8-bit) ASCII. Where
>> possible, text files are tab delimited. Some files have been
>> converted into standard HTML encoding (ISO-8859-1) from Unicode.


>> The closest equivalent character in ISO-8859-1 was selected, and
>> any diacritics simulated using <SUB> and <SUP> and the closest
>> equivalent punctuation mark. In the case of Cyrillic, Greek and
>> Hebrew, a consistent transliteration scheme was used. The source
>> for each file contains hidden tags which specify the Unicode value
>> for each character which has no ISO-8859-1 equivalent.

There is a standard way to do that

>> To obtain these values, you can download the file or view its
>> source in your browser. The tags have the form <!u
>> XXXX>character</!u>, where XXXX is the four digit hexadecimal value
>> of the Unicode character.

Shudder. Although a perl script could probably go and reverse this

Interesting site, but (shakes head) why oh why!!



                            FILE IE-DATA1

  Copyright (C) 1997 by Isidore Dyen, Joseph Kruskal, and Paul Black
              This file was last modified on Feb 5, 1997

maybe that is why.

 Chris Lilley          
 Chair, W3C SVG Working Group
 Member, W3C Technical Architecture Group

Received on Sunday, 22 February 2004 18:21:51 UTC