Re: Event Updated: Internationalization (I18N) WG Teleconference

Hi Addison,

The first two sentences are almost exactly what I was looking for,
and better text than I could easily have come up with.  I would think
about making one small change, but am not sure it is important, so up
to you.   That would be to make "such as those specified by ISO/IEC
8859" include either ASCII, ISO/IEC 646, or both so you don't have
just the one set of examples and so the 7-bit codes are represented.

I'm less sure about the last (third) sentence.  "[Encoding]" does not
point anywhere.  Maybe that would explain if it existed, but I'm not
at all sure what "we mean the specific modern mappings of bytes to
Unicode code points" means.   If something were keyed to 8859-1 per
early HTML (much less non-LatinX parts of 8859), it would stand on
itself as a one octet per character encoding, not be somehow
magically mapped to Unicode code points.

Also, in the Note that follows, your point would be clearer and seem
less contradictory if you inserted "historically" into the second
sentence, yielding "these strings have historically typically been
represented using UTF-16" or some approximation.  You then go on to
say that UTF-16 is a poor choice and that UTF-8 is preferred, which,
IMO, is just right, but then maybe the note should get a bit closer
to "we used to do it that way, but should now move more fully to
UTF-8 rather than continuing with UTF-16/

best,
   john

--On Thursday, July 3, 2025 08:32 -0700 Addison Phillips
<addisoni18n@gmail.com> wrote:

> Hi John,
> 
> Thanks for the note and the discussion in the teleconference.
> 
> I made changes, notably a lengthy addition about CCS history. Check
> it out?
> 
> https://github.com/w3c/bp-i18n-specdev/pull/162
> 
> https://deploy-preview-162--bp-i18n-specdev.netlify.app/#char_choos
> ing
> 
> One more review?
> 
> Thanks!
> 
> Addison
> 
> On 7/3/2025 6:11 AM, John C Klensin wrote:
>> Addison,
>> 
>> I've looked through the latest specdev update.  Much better, but
>> I'd still finding the "encoding" and "character encoding"
>> terminology problematic, especially since, in another context, I
>> recently got pulled into a discussion about the legitimacy of HTTP
>> 1.0 and 1.1.
>> 
>> Suggestion:
>> 
>> Put in a few sentences, possibly under "Useful background and
>> overviews for this section" or immediately following it, or at
>> least before the first "Note" in  Section 4.6 that says something
>> like:
>> 
>> 	"The term 'encoding' or 'character encoding' has
>> 	historically been used in a variety of different ways when
>> 	character representation and processing are concerned.  In
>> 	this document (section?) it refers both to the different
>> 	encoding methods specified in conjunction with the Unicode
>> 	Standard and to the historically large collection of
>> 	non-Unicode Coded Character Sets.  Unless otherwise
>> 	specified, only Unicode is under discussion in this section."
>> 
>> That may eliminate the need to go into the details implied by
>> "Explain the relationship between windows-1252, Latin1, and ASCII"
>> of ticket #2000.   If it does not, it would lay the foundation for
>> that explanation.  And, btw, if was are going to pursue that,
>> ISO/IEC 8859 and its various components should be part of the
>> explanation, not just 8859-1, which I believe is the most common
>> specific/standard definition of "Latin1" or "Latin-1"..
>> 
>>      john
>> 
>> 
>> 

Received on Friday, 4 July 2025 03:27:27 UTC