RE: A note on case sensitivity from Gavin Nicol on 1996-10-28 (w3c-sgml-wg@w3.org from October 1996)

From: Gavin Nicol <gtn@ebt.com>
Date: Mon, 28 Oct 1996 11:08:01 -0500
To: mtbryan@sgml.u-net.com
CC: gkholman@microstar.com, w3c-sgml-wg@w3.org
Message-Id: <199610281608.LAA07089@nathaniel.ebt>

>Even then they have not looked beyond the existing SGML case
>of 1:1 mapping. You only have to consider the mapping for &szlig; to
>understand that 1:1 is not sufficient for 10646. In fact a general m:m
>solution is needed to cope with all the quirks of all languages. (But this
>must wait for SGML97++ I suspect:-(...)

The need to specify lexical equivalence of strings is an important
capability missing from SGML. Rick and I've talked about this many
times.

>The point is that we need to be able to build composite documents from
>entities that have their own language-sets. At present SGML does not allow
>for this because of the rules about shared character sets. HMTL forbids it
>due to character set restrictions and its inability to reference
>entities. I would like those developing XML to consider the language
>question from day one, rather than as an add-on, and to consider it
>with respect to whether we need a better way to intergrate data
>entities so that we can prepare compound multilingual documents
>logically.

This is why I suggested ISO 10646. using this we should be able to
do something like:

  <XML>
  <DIV LANG="en.uk">
  &english;
  </DIV>
  <DIV LANG="ja">
  &japanese;
  </DIV>
  <DIV LANG="zh">
  &chinese;
  </DIV>
  </XML>

and even if the entities are in different encodings, parse, and
process the document.

Received on Monday, 28 October 1996 11:11:15 UTC