Re: [Encoding] false statement [I18N-ACTION-328][I18N-ISSUE-374] from John Cowan on 2014-08-28 (www-international@w3.org from July to September 2014)

From: John Cowan <cowan@mercury.ccil.org>
Date: Thu, 28 Aug 2014 18:23:16 -0400
To: John C Klensin <john+w3c@jck.com>
Cc: Andrew Cunningham <lang.support@gmail.com>, wwwintl <www-international@w3.org>, Larry Masinter <masinter@adobe.com>, "Phillips, Addison" <addison@lab126.com>, Richard Ishida <ishida@w3.org>
Message-ID: <20140828222316.GC19452@mercury.ccil.org>

John C Klensin scripsit:

>  -- the Standard UTF-8 encoding, i.e., what a
> 	standard-conforming UTF-8 encoder would produce given a
> 	list of code points (assigned or not),  or

AFAIK this is not a big problem today.

>  -- the Unicode code point assignments, i.e., it uses
> 	private code space and/or "squats" on unassigned code
> 	points, perhaps in so-far completely unused or sparcely
> 	populated planes, or

This one is common, and may even involve repurposing existing code point
assignments.  This usually happens because the font makes assumptions
that a given code point will remain unassigned, or will be assigned in
the way the font author expects -- assumptions which wind up being wrong.

>  -- established Unicode conventions by combining existing
> 	and standardized Unicode points using conventions (and
> 	perhaps special font support) about how those sequences
> 	are interpreted that are not part of the Unicode
> 	Standard.

This is also a problem; generally it's about using visual rather than
logical order of combining characters.

-- 
John Cowan          http://www.ccil.org/~cowan        cowan@ccil.org
Adam [...] did not want the apple for the apple's sake, he wanted it only
because it was forbidden. The mistake was not forbidding the serpent;
then he would have eaten the serpent. --Mark Twain

Received on Thursday, 28 August 2014 22:23:43 UTC