- From: Martin J Duerst <mduerst@ifi.unizh.ch>
- Date: Thu, 24 Oct 1996 17:07:19 +0100 (MET)
- To: keld@dkuug.dk (Keld J|rn Simonsen)
- Cc: rosenne@NetVision.net.il, www-international@w3.org
Keld Simonsen wrote: >Martin J Duerst writes: > >> Keld Simonsen wrote: >> >> >Martin J Duerst writes: > >Again, the user does not care >how the information is encoded, as long as what (s)he sees >is understandable and what is expected. One or two characters >does not matter to the user. So again it is up to the system designer >to code the information in an unambigeous and well-defined way. >In the case of accented Latin characters 10646 then specifies >normatively only one way of encoding. As Jonathan Rosenne, I don't really agree on this point. Assume I have something like A-with-dot-below, which does not exist as precomposed in ISO 10646. For this thing, what does ISO 10646 (normatively or otherwise) specify? (1) No way to encode it. (2) Encode it as A followed by combining-dot-below. (3) Anything else. >> >I agree that for some scripts, you need combining characters. >> >But for almost all of Latin based languages, you have all you >> >need in form of whole characters in 10646. There are a few >> >examples of Latin letters that are not encoded in 10646, and for that >> >the only way to represent that information is with >> >the use of combining characters, agreed. But the occurrances of those >> >combinaion would be very minimal compared to what can be coded >> >directly in 10646. >> >> The important words here are "almost all" and "minimal". Some >> people believe that this can be changed to "all" and "none", >> just by adding more precombined characters. The fact is that >> it cannot be done, there are several thousand languages >> written with the Latin script, and linguists invent new >> combinations according to their needs. The addition of >> new combinations, however, has the undesired effect to >> further marginalize those languages that need combining >> characters, leading to a very bad vicious cycle. > >Why do you think this is so bad? The rare languages will get support >at some stage, and that in an international standard that we >will hopefully have implemented widely. And there is already some >support now for them. That is better than for languages that >use scripts not available in 10646 yet. The combining semantics >will need to be available in 10646 products anyway, to support >scripst like the Thai and Indic scripts. The keyword here is "at some stage". And one also has to realize that combining semantics in particular for Indic scripts can be handled quite different from Latin, because it is much less a general combination, and much more a complicated arrangement of special cases. Assume, for a littel while, that not even the precombinations in Latin-1 would be available in ISO 10646. This would mean obviously that because of large and wealthy markets such as Germany and France, everybody would immediately start to work on combining characters. And these implementations would be completed rather soon, and would be very straghtforward. All rare languages, and maybe even Indic languages and Thai, would benefit from this. What happens in practice is that every country/region/language that seriously starts to look at information processing tries to access whether their needs are covered in ISO 10646. Of course, if Latin letters with diacritic extensions are used, then *in theory* their needs are covered. But because the availability of precomposed codepoints for the major markets has not forced the software makers (except for very few specialists selling their products with higher prices) to care for composition, this remains theory. Such countries/regions/languages then decide that they have to work on getting their combinations into ISO 10646 as precomposed codepoints. Besides these economic reasons, there is also an aspect of pride, namely that big and wealthy countries/languages/regions have their own precomposed codepoints already in there, and a smaller and/or less wealthy country does not what to feel excluded. So more and more countries finally manage to get their combinations into the standard, making it less and less attractive for software makers to seriously care about composition. Unfortunately, it's just always those who could afford decomposition easily that have their precompositions, and those that cannot afford don't get anything. Regards, Martin.
Received on Thursday, 24 October 1996 11:09:04 UTC