Re: Character entities and glyph mapping

> I am puzzled by how Amaya handles character entities (e.g. ™ or €).  These strings are properly converted to their Unicode codepoints during import, but then PutNonISOlatin1Char() is called to coerce many of them to other values.  This step attempts to map the Unicode values to other values with "similar" glyphs.  The problem I have is that it appears to destroy information in the process.  For example, € is initially translated to 8364, which is then translated to 206.  This text value is written out as Unicode 206, not as 8364.

There were some bugs concerning Unicode values in the current release. We 
fixed them in the CVS release
but I'm not sure the bug you mentioned is different or not.
Could you test it? We plan to publish this version tomorrow evening (French 
time).

> I am using Windows NT 4.0 plus some service packs.  I created a page that has a table which shows all the character entities understood by Amaya.  I can display this page in Internet Explorer and it looks good.  Most of the glyphs are what I expect.  I then use Amaya to import and save this page, and I view the resulting page.  It doesn't compare very well to the original.  I commented out the calls to PutNonISOlatine1Char() and reran the test.  This time the pages compare nearly identically.  (♦ came out oddly, that was the only difference between the two pages)

As you know today Amaya doesn't provide a full Unicode support. The role of 
the function PutNonISOlatine1Char()
is to preserve Unicode values that cannot be displayed by Amaya.

> I have two questions:
> 
> 1.	My understanding of the HTML specification is that it is fine to internally map characters to other glyphs, but you are not allowed to destroy the original information.  Do I correctly understand, and if so, then how does Amaya's behavior match the spec?

Correct except for entities that can be encoded normally with the current 
charset (example if the encoding
is ISO-latin-1 é can be replaced by é).

> 2.	Am I going to encounter other problems if I simply remove the calls to PutNonISOlatin1Char()?

Sure.

> 
> Thanks,
> Rich
> 

-- 
     Irene.

Received on Tuesday, 27 February 2001 12:06:31 UTC