Re: charset issues

>  Say, I want to have a text which will contain  the following fragment
> on one line:
> ----------------------------------------------------------
>       .. The German word ko"se becomes froma`ge  in French,  but  sy'r
> in Czech yet ... in  ....
> ---------------------------------------------------------------------
> 
> 
> 
>     With current system of static character sets I would need a charset
> which combines all
>  Latin-1 and Latin-2 and ...
>  
>   But, if you reserve ONE special character or tag or even just an
> attribute for this, I can write this one line  like this:

You can do something functionally equivalent by using HTML as specified
with HTML 2.0 + the HTML internationalization spec.

If you want to send the whole thing in US-ASCII, use numeric character
references (which refer to ISO-10646), or if you don't like that, use
the UTF8 encoding of ISO-10646. To solve the problem of giving font
hints to software use the LANG attribute.

ISO-10646 _is_ a character set which includes all the glyphs/characters
of Latin-1, Latin-2, JIS, etc. But, it's my reading of the HTML specs
that it's possible to follow the internationalization spec without
providing graphic representations of every character in ISO-10646 ...
just keep a mapping to/from ISO-10646 for the glyphs that you have
in the fonts available. I quote:

"With the document character set being the full ISO 10646, the possi-
 bility that a character cannot be displayed due to lack of appropri-
 ate resources (fonts) cannot be avoided. Because there are many dif-
 ferent things that can be done in such a case, this document does not
 prescribe any specific behaviour" ... (it offers suggestions)

Operating in this mode will do the things you want without reinventing
the wheel, and will scale upward better. (Unfortunately, it's not widely
supported, yet, which is what started this thread.)

-- 
    Albert Lunde                      Albert-Lunde@nwu.edu

Received on Saturday, 7 December 1996 01:11:37 UTC