W3C home > Mailing lists > Public > www-international@w3.org > October to December 1996

Re: Internationalized CLASS attributes

From: Martin J Duerst <mduerst@ifi.unizh.ch>
Date: Thu, 24 Oct 1996 17:07:19 +0100 (MET)
To: keld@dkuug.dk (Keld J|rn Simonsen)
Cc: rosenne@NetVision.net.il, www-international@w3.org
Message-ID: <"josef.ifi..390:"@ifi.unizh.ch>
Keld Simonsen wrote:

>Martin J Duerst writes:
>> Keld Simonsen wrote:
>> >Martin J Duerst writes:
>Again, the user does not care
>how the information is encoded, as long as what (s)he sees 
>is understandable and what is expected. One or two characters
>does not matter to the user. So again it is up to the  system designer
>to code the information in an unambigeous and well-defined way.
>In the case of accented Latin characters 10646 then specifies
>normatively only one way of encoding.

As Jonathan Rosenne, I don't really agree on this point.
Assume I have something like A-with-dot-below, which does
not exist as precomposed in ISO 10646. For this thing, what
does ISO 10646 (normatively or otherwise) specify?

(1) No way to encode it.
(2) Encode it as A followed by combining-dot-below.
(3) Anything else.

>> >I agree that for some scripts, you need combining characters.
>> >But for almost all of Latin based languages, you have all you
>> >need in form of whole characters in 10646. There are a few
>> >examples of Latin letters that are not encoded in 10646, and for that
>> >the only way to represent that information is with
>> >the use of combining characters, agreed. But the occurrances of those
>> >combinaion would be very minimal compared to what can be coded
>> >directly in 10646.
>> The important words here are "almost all" and "minimal". Some
>> people believe that this can be changed to "all" and "none",
>> just by adding more precombined characters. The fact is that
>> it cannot be done, there are several thousand languages
>> written with the Latin script, and linguists invent new
>> combinations according to their needs. The addition of
>> new combinations, however, has the undesired effect to
>> further marginalize those languages that need combining
>> characters, leading to a very bad vicious cycle.
>Why do you think this is so bad? The rare languages will get support
>at some stage, and that in an international standard that we
>will hopefully have implemented widely. And there is already some
>support now for them. That is better than for languages that
>use scripts not available in 10646 yet. The combining semantics
>will need to be available in 10646 products anyway, to support
>scripst like the Thai and Indic scripts.

The keyword here is "at some stage". And one also has to realize
that combining semantics in particular for Indic scripts can be
handled quite different from Latin, because it is much less a
general combination, and much more a complicated arrangement
of special cases.

Assume, for a littel while, that not even the precombinations
in Latin-1 would be available in ISO 10646. This would mean
obviously that because of large and wealthy markets such as
Germany and France, everybody would immediately start to
work on combining characters. And these implementations would
be completed rather soon, and would be very straghtforward.
All rare languages, and maybe even Indic languages and Thai,
would benefit from this.

What happens in practice is that every country/region/language
that seriously starts to look at information processing tries
to access whether their needs are covered in ISO 10646. Of course,
if Latin letters with diacritic extensions are used, then
*in theory* their needs are covered. But because the availability
of precomposed codepoints for the major markets has not forced
the software makers (except for very few specialists selling
their products with higher prices) to care for composition,
this remains theory. Such countries/regions/languages then
decide that they have to work on getting their combinations
into ISO 10646 as precomposed codepoints. Besides these
economic reasons, there is also an aspect of pride, namely
that big and wealthy countries/languages/regions have their
own precomposed codepoints already in there, and a smaller
and/or less wealthy country does not what to feel excluded.

So more and more countries finally manage to get their
combinations into the standard, making it less and less
attractive for software makers to seriously care about
composition. Unfortunately, it's just always those who
could afford decomposition easily that have their
precompositions, and those that cannot afford don't
get anything.

Regards,	Martin.
Received on Thursday, 24 October 1996 11:09:04 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:16 UTC