[Prev][Next][Index][Thread]

RE: A note on case sensitivity



>Even then they have not looked beyond the existing SGML case
>of 1:1 mapping. You only have to consider the mapping for ß to
>understand that 1:1 is not sufficient for 10646. In fact a general m:m
>solution is needed to cope with all the quirks of all languages. (But this
>must wait for SGML97++ I suspect:-(...)

The need to specify lexical equivalence of strings is an important
capability missing from SGML. Rick and I've talked about this many
times.

>The point is that we need to be able to build composite documents from
>entities that have their own language-sets. At present SGML does not allow
>for this because of the rules about shared character sets. HMTL forbids it
>due to character set restrictions and its inability to reference
>entities. I would like those developing XML to consider the language
>question from day one, rather than as an add-on, and to consider it
>with respect to whether we need a better way to intergrate data
>entities so that we can prepare compound multilingual documents
>logically.

This is why I suggested ISO 10646. using this we should be able to
do something like:

  <XML>
  <DIV LANG="en.uk">
  &english;
  </DIV>
  <DIV LANG="ja">
  &japanese;
  </DIV>
  <DIV LANG="zh">
  &chinese;
  </DIV>
  </XML>

and even if the entities are in different encodings, parse, and
process the document.