W3C home > Mailing lists > Public > www-i18n-comments@w3.org > February 2001

RE: Last Call review of Character Model for the WWW

From: Karlsson Kent - keka <keka@im.se>
Date: Wed, 14 Feb 2001 18:34:15 +0100
Message-ID: <C110A2268F8DD111AA1A00805F85E58D0115A98C@ntgbg1>
To: "'www-i18n-comments@w3.org'" <www-i18n-comments@w3.org>

Further comments relating to LCC 57 on W3C-normalisation:

1. Differentiate between *numeric character escapes*
   (like &#...; and &#x...; in XML) and *string include escapes*
   (which may be character escapes, but may be more general;
   ENTITY _references_ in XML).  It may be that the original
   intent in the document was to let 'character escapes' mean
   just 'numeric character escapes'; but entity references must
   be dealt with, whatever each of these things are called.

2. Reccomend requiring that a 'string include escape' is *defined*
   such that the string it expands to where used do NOT begin with
   a combining character, nor a numeric character escape for a
   combining character.

3. Reccomend requiring that a 'string include escape' (their
   application, where the string is inserted) cannot be followed
   by a combining character or a numeric character escape
   for a combining character.

That way 'string include escapes' would not cause any NFC-disruption
when expanding the 'string include escape'.  Note that there may be
a combining character before a 'string include escape', and that
the string a 'string include escape' is defined to include may end
with a combining character without any problem.

XML 1.0 does not enforce this in any way; but I think that the
next version of XML should do so to:
	a. Avoid potential NFC-disruption problems when doing
	   W3C-normalisation followed by expansion of entity references,
	   when the definition of the entity is W3C-normalised.

	b. Make W3C-normalisation possible without it expanding 'string
	   include escapes'; indeed the definition of a 'string include
	   escape' may be unavailable to the W3C-normalisation process,
	   only to be supplied later; further, it is often the (human)
	   author's choise to use a 'string include escape', and those
	   should not be expanded away by a character normalisation
	   process, but only much later.

An alternative could be to say that expansion of 'string include escapes'
turn a W3C-normal (or NFC) original into a result that is not normalised.
But that would go against the idea of early getting NFC and keeping it.
Expanding a 'string include escape' is an 'editing' operation, but one that
is normally done long after any other kind of editing of the document.
Received on Wednesday, 14 February 2001 12:38:01 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:20:11 UTC