Character entities and glyph mapping

I am puzzled by how Amaya handles character entities (e.g. ™ or €).  These strings are properly converted to their Unicode codepoints during import, but then PutNonISOlatin1Char() is called to coerce many of them to other values.  This step attempts to map the Unicode values to other values with "similar" glyphs.  The problem I have is that it appears to destroy information in the process.  For example, € is initially translated to 8364, which is then translated to 206.  This text value is written out as Unicode 206, not as 8364.

I am using Windows NT 4.0 plus some service packs.  I created a page that has a table which shows all the character entities understood by Amaya.  I can display this page in Internet Explorer and it looks good.  Most of the glyphs are what I expect.  I then use Amaya to import and save this page, and I view the resulting page.  It doesn't compare very well to the original.  I commented out the calls to PutNonISOlatine1Char() and reran the test.  This time the pages compare nearly identically.  (♦ came out oddly, that was the only difference between the two pages)

I have two questions:

1.	My understanding of the HTML specification is that it is fine to internally map characters to other glyphs, but you are not allowed to destroy the original information.  Do I correctly understand, and if so, then how does Amaya's behavior match the spec?
2.	Am I going to encounter other problems if I simply remove the calls to PutNonISOlatin1Char()?

Thanks,
Rich

Received on Tuesday, 27 February 2001 11:05:38 UTC