Re: more xhtml 2.0 comments from Michael Day on 2003-04-18 (www-html@w3.org from April 2003)

From: Michael Day <mikeday@yeslogic.com>
Date: Fri, 18 Apr 2003 12:00:38 +1000 (EST)
To: Ernest Cline <ernestcline@mindspring.com>
Cc: www-html@w3.org, <Donna.Worby@dardni.gov.uk>
Message-ID: <Pine.LNX.4.44.0304181155270.25201-100000@lorien.yeslogic.com>

Hi Ernest,

> I strongly doubt that an 'Mc' character will ever be part of Unicode.  
> The Unicode view is that 'Mc' is what the standard refers to as a 
> grapheme, and as such it should be encoded as two characters 'M' and 
> 'c'.  Existing multi-letter characters, sich as 'Dz' were included in 
> Unicode only because they existing in pre-UNICODE character sets and 
> were therefore included in Unicode to facilitate conversion between 
> those character sets and Unicode on a character for character basis.

That's interesting. So, given that "Mc" is rendered differently and
collated differently from the sequence of two characters "M" and "c", how
should this be handled?

Is it in fact an issue of script/language, in the same way that Spanish 
collates the character combinations "ll" and "ch" differently?

Presumably then if the sequence "Mc" is encountered in text with language
en-UK (or some other code?) it should be collated differently and rendered
using a superscript c or other method.

Surely there must be some existing standard for this?

Michael Day

YesLogic Pty. Ltd.

Received on Thursday, 17 April 2003 20:44:26 UTC