- From: Martin J Duerst <mduerst@ifi.unizh.ch>
- Date: Thu, 17 Oct 1996 11:31:42 +0100 (MET)
- To: rosenne@NetVision.net.il (Jonathan Rosenne)
- Cc: www-international@w3.org
Jonathan Rosenne wrote: >Bert Bos wrote: >> >> The next version of HTML will have a CLASS attribute on (nearly) all >> elements, as described in several documents ([1], [2], [3], [4]). The >> intention is to allow authors to attach semantic information to >> elements, in the form of keywords: >> >> <p class=abstract>... >> <em class=surname>... >> >> The keywords can also be picked up by a style sheet to display the >> element in a special way. >> >> However, there is a problem: a conflict between case-insensitivity and >> allowing non-ASCII characters. We'd like to be able to say that the >> above example is exactly the same as >> >> <P CLASS=ABSTRACT>... >> <EM CLASS=SURNAME>... > >I used to write COBOL, but then I began to C... > >I don't believe there is added value in case-insensitivity this day and >age. Are there any of those terminals that always display upper case >still around? Those with the a->A switch? > >I suggest that the class names should be defined as case sensitive. I completely agree with Jonathan. There is no reason for such behaviour anymore. Declaring things NAME and relying on SGML does not work very well, and starting to define your own case equivalence for CDATA is too much effort vasted for too little benefit. >A friendly browser could, of course, do a case insensitive search if the >case sensitive search fails. NO, PLEASE! Users will have no big problems if browsers clearly reject to display things that don't match. Users have no problem distinguishing upper case and lower case, if they are told to do so. But they won't learn it if browsers don't tell them, and will get confused if different browsers show different behaviour. Let's try not to make the same mistakes as with other HTML syntax. Let's try to avoid bugwards compatibility. >ASCII only names are too limiting. People should be able to name things >in their own language. When we designed the i18n extensions for HTML, we decided to not extend the character set for tags beyond ASCII. I think this was okay because it affected only the limited set of existing tags. For class names, which can be anything, this restriction is definitely less justified. >But there is another problem with internationalized names: UCS defines a >non-unique coding. Some composite characters have at least two valid >representations, the composed character and the base character followed >by diacritics. If there is more than one diacritics, their order is not >defined. The user often has no control over the coding. So before using >a name, it must be brought to a canonical representation. This is definitely a problem that should be addressed. In Java, it was "solved" the easy way, saying that different encodings are different identifiers. Maybe this was okay for real programmers. For HTML users, it's definitely not okay. Still, in this case, it's rather easy because only equivalence has to be specified. I am currently working on something else, the internationalization of URLs, where equivalence is probably not enough, and where a normalized encoding is desired. There are other, related cases of equivalence. One is the full-width/ half-width issue for East Asian charcater sets. From a user point of view, these can easily be distinguished, so it is not necessary to unify them. On the other hand, one variant is clearly a compatibility variant, so unification to get rid of compatibility is also not a bad idea. Even less of a concern are compatibility ligatures. They should rarely if ever be used when creating HTML. Regards, Martin.
Received on Thursday, 17 October 1996 05:31:56 UTC