Re: Internationalized CLASS attributes

Martin J Duerst writes:

> Keld Simonsen wrote:
> 
> >Martin J Duerst writes:

> >What you describe here is the way
> >characters are typed in, and that is quite different from how it
> >is represented internally. For example the A followed with GRAVE is
> >normally typed in on Latin keyboards with *first* entering a
> >dead key "GRAVE" and then the A.
> 
> Yes, this is popular, because it was that way on typewriters,
> for mechanical reasons. But people not used to typewriters
> in many cases prefer to type the accent after the base letter,
> as I guess most of them write it that way, and think of the
> accent as an addition to the base letter.
> 
> >The input system needs to combine
> >this into an A-GRAVE, or as you suggest, as *first* an A and then
> >a combining grave, that is it intelligently have to reverse the order
> >of the base letter and the accent.
> 
> Not much intelligence needed. That all can be done by simple tables.

Agree. But my point was that what is encoded is not a reflection
of what the user percieves of a character, but rather what the
system designers decide. And so it should be, as users don't care
about the encoding of characters, as long as all what is needed is
available.

> >> A user does not have or see ISO 10646 characters. A user
> >> sees and deals with things on the screen and on paper.
> >> ISO 10646 characters are abstract entities, and we have
> >> to make sure, where possible, that the application takes
> >> provisions to reconcile these abstract entities with the
> >> expectations of the user if the expectations of the user
> >> are different.
> >
> >I think that is very hard to do. How can you find out what
> >a user percieves a character to be? On the keyboards I know of
> >of Latin, you often have dead keys to enter accented characters,
> >so either if the user percieves ths as two characters, or perceives
> >it as one character, it needs to be keyed in the smae way.
> >
> >I find that it is much more relevant that the system codes the
> >information in one unambigeous way, and this is the responsibility of the
> >system designer of the keyboard interface, in conjunction with the
> >designers of the rest of the system.
> 
> Of course. What I wanted to say above is that although different
> users may have different ideas about A-grave, thinking about
> it as one or two characters, or as something inbetween, or probably
> not having a very explicit idea about it anyway, every single
> user thinks that all A-grave that (s)he sees on any form of
> paper or on any computer screen are the same.

That is a good way to explain it. Again, the user does not care
how the information is encoded, as long as what (s)he sees 
is understandable and what is expected. One or two characters
does not matter to the user. So again it is up to the  system designer
to code the information in an unambigeous and well-defined way.
In the case of accented Latin characters 10646 then specifies
normatively only one way of encoding.

> >I agree that for some scripts, you need combining characters.
> >But for almost all of Latin based languages, you have all you
> >need in form of whole characters in 10646. There are a few
> >examples of Latin letters that are not encoded in 10646, and for that
> >the only way to represent that information is with
> >the use of combining characters, agreed. But the occurrances of those
> >combinaion would be very minimal compared to what can be coded
> >directly in 10646.
> 
> The important words here are "almost all" and "minimal". Some
> people believe that this can be changed to "all" and "none",
> just by adding more precombined characters. The fact is that
> it cannot be done, there are several thousand languages
> written with the Latin script, and linguists invent new
> combinations according to their needs. The addition of
> new combinations, however, has the undesired effect to
> further marginalize those languages that need combining
> characters, leading to a very bad vicious cycle.

Why do you think this is so bad? The rare languages will get support
at some stage, and that in an international standard that we
will hopefully have implemented widely. And there is already some
support now for them. That is better than for languages that
use scripts not available in 10646 yet. The combining semantics
will need to be available in 10646 products anyway, to support
scripst like the Thai and Indic scripts.

Keld

Received on Thursday, 24 October 1996 09:57:48 UTC