- From: John C Klensin <john+w3c@jck.com>
- Date: Thu, 03 Jul 2025 23:27:14 -0400
- To: Addison Phillips <addisoni18n@gmail.com>, public-i18n-core@w3.org
Hi Addison, The first two sentences are almost exactly what I was looking for, and better text than I could easily have come up with. I would think about making one small change, but am not sure it is important, so up to you. That would be to make "such as those specified by ISO/IEC 8859" include either ASCII, ISO/IEC 646, or both so you don't have just the one set of examples and so the 7-bit codes are represented. I'm less sure about the last (third) sentence. "[Encoding]" does not point anywhere. Maybe that would explain if it existed, but I'm not at all sure what "we mean the specific modern mappings of bytes to Unicode code points" means. If something were keyed to 8859-1 per early HTML (much less non-LatinX parts of 8859), it would stand on itself as a one octet per character encoding, not be somehow magically mapped to Unicode code points. Also, in the Note that follows, your point would be clearer and seem less contradictory if you inserted "historically" into the second sentence, yielding "these strings have historically typically been represented using UTF-16" or some approximation. You then go on to say that UTF-16 is a poor choice and that UTF-8 is preferred, which, IMO, is just right, but then maybe the note should get a bit closer to "we used to do it that way, but should now move more fully to UTF-8 rather than continuing with UTF-16/ best, john --On Thursday, July 3, 2025 08:32 -0700 Addison Phillips <addisoni18n@gmail.com> wrote: > Hi John, > > Thanks for the note and the discussion in the teleconference. > > I made changes, notably a lengthy addition about CCS history. Check > it out? > > https://github.com/w3c/bp-i18n-specdev/pull/162 > > https://deploy-preview-162--bp-i18n-specdev.netlify.app/#char_choos > ing > > One more review? > > Thanks! > > Addison > > On 7/3/2025 6:11 AM, John C Klensin wrote: >> Addison, >> >> I've looked through the latest specdev update. Much better, but >> I'd still finding the "encoding" and "character encoding" >> terminology problematic, especially since, in another context, I >> recently got pulled into a discussion about the legitimacy of HTTP >> 1.0 and 1.1. >> >> Suggestion: >> >> Put in a few sentences, possibly under "Useful background and >> overviews for this section" or immediately following it, or at >> least before the first "Note" in Section 4.6 that says something >> like: >> >> "The term 'encoding' or 'character encoding' has >> historically been used in a variety of different ways when >> character representation and processing are concerned. In >> this document (section?) it refers both to the different >> encoding methods specified in conjunction with the Unicode >> Standard and to the historically large collection of >> non-Unicode Coded Character Sets. Unless otherwise >> specified, only Unicode is under discussion in this section." >> >> That may eliminate the need to go into the details implied by >> "Explain the relationship between windows-1252, Latin1, and ASCII" >> of ticket #2000. If it does not, it would lay the foundation for >> that explanation. And, btw, if was are going to pursue that, >> ISO/IEC 8859 and its various components should be part of the >> explanation, not just 8859-1, which I believe is the most common >> specific/standard definition of "Latin1" or "Latin-1".. >> >> john >> >> >>
Received on Friday, 4 July 2025 03:27:27 UTC