- From: Rick Jelliffe <ricko@allette.com.au>
- Date: Sat, 28 Jun 1997 04:00:16 +1000
- To: <w3c-sgml-wg@w3.org>
> From: Bert Bos <bert@w3.org> > A proposal: > > * Make the values of ID attributes case-sensitive It is a fair proposal, and a well-troden path. The basic reason why not is that it introduces a whole new class of frustrating errors for users, especially users coming from the PC world, who assume that identifiers are case-insensitive as a matter of course. > The HTML WG recently studied the issue. In summary: > > - current browsers don't consider <A NAME="xxx"> to be a > target for <A HREF="#XXX"> In SGML, in most DTDs entity references are case sensistive (to get a difference between Äaut; and äaut;), but other names (e.g. the GIs of elements aren't). Of course, SGML lets you pick the case behaviour, and after 10 years people still predominantly choose case-insensitive names. > - there is no way to determine the language of an ID, > therefore the case-mapping rules aren't known either. > Any mapping rule will surprise some people. Software that is predictable will surprise some people :-) The Unicode 2.0 book (p.4-2) notes "in general, the vast majority of case mappings are uniform accross languages". We discussed the issue (at WG8 in 1995) of whether SGML should allow more sophisticated kinds of case mapping (e.g. many to single), and we found that no national body wanted to request it. The general consensus was that to have more complex rules, just to handle a few abberrant case mappings was not worth complicating the SGML declaration more over. And especially if each problem occurred for only a single nation's script, and if each needed a different syntax. > - case-sensitivity is easy to explain and avoids > surprises (e.g., people find it easy to see a > difference between A and a, much easier than between > full-width/half-width letters in Japanese, or > precomposed letters and floating accents, e.g....) In the particular case of the full-width and half-width alphabets and katakana, only one set is allowed in the SGML declaration I have sent in. (All compatability zone characters are dropped.) > - case-insensitive mapping is hard to implement; > it needs a few dozen Kb of tables in Java. We are using the standard Unicode case-mapping, I believe. JDK 1.1 comes with Unicode case functions in java.lang.character. > - the repertoire of Unicode/ISO-10646 is open-ended: more > letters will be added later, but with case-insensitive > mapping, the implementations won't have to change. I have always pushed that XML naming rules stick with just the characters that appear in common words and are in non-surrogate, non-compatability zone Unicode. Native Language Markup does not demand that all words be available, just a minimum set. > The well-known problem cases are the dotless-i of Turkish, the sharp-s > of German, the uncertainty over dropping accents from uppercase letters > in French. But Germans know they have a problem with computers and the sharp-s, and Turks know their eyes are dotty. I don't know what the French think about their accents, but I'd imagine it is the same. As far as the Turkish I goes, I'd much prefer to say they all four are the same character. But we should defer to the Unicode people: it is their game. In English, we don't really complain because we cannot have spaces in our names, even though it produces somewhat artificial strings; the Germans may smile at us pityingly and pat themselves on their backs for having an agglutinating language. In English we care that we can use alphabetics and that we can at least make sense of our identifiers. I think people know that universality gives more immediate results, even if there are still a few bumps for most nations. Rick Jelliffe
Received on Friday, 27 June 1997 14:24:09 UTC