[Last-Call]: <draft-bormann-dispatch-modern-network-unicode-05> (Modern Network Unicode): W3C I18N Review

All,

The W3C Internationalization Working Group (of which I am chair) was 
requested to review several IETF documents nearing or in IETF Last Call.

This email represents the issues our working group noticed in our review of:

https://datatracker.ietf.org/doc/draft-bormann-dispatch-modern-network-unicode/

---

The specific issues our group identified are tracked in github here:

https://github.com/w3c/i18n-activity/issues?q=is%3Aissue%20state%3Aopen%20label%3As%3Amodern-network-unicode

---

Here are the comments:

#1972: Exclude other non-characters

Section 2, point number 4:
https://datatracker.ietf.org/doc/draft-bormann-dispatch-modern-network-unicode/

     4. The code points U+FFFE and U+FFFF MUST NOT be used. Also, Byte
        Order Marks (leading U+FEFF characters) MUST NOT be used.

This should probably exclude non-character code points at the end of 
each supplementary plane (e.g. U+1FFFE, U+2FFFF, U+10FFFE, usw.)

---

#1973: Relationship to CRLF line endings

https://datatracker.ietf.org/doc/draft-bormann-dispatch-modern-network-unicode/

Section 3 disallows CR in "2D MNU" (line-based Unicode text). Section 5 
allows specs to define various variances that include CR and CRLF line 
feeds. Disallowing CRLF rather than supporting it adaptively seems like 
it would create a lot of uncertainty.

---

#1974: "With NFKC" variant considered harmful

Section 5.7 defines a "With NFKC" variant.

This is probably a Bad Idea.

NFKC is destructive and also might be incomplete in accomplishing 
something useful. Mentioning the K forms is probably fine, but by not 
defining this, one could stay away from the problems it produces. Note 
that W3C has this note in charmod-norm <https://www.w3.org/TR/charmod-norm>:

    Unicode compatibility decomposition removes meaning from the text
    that it is applied to. That means that this normalization step
    produces the most promiscuous matches. Some developers and
    specification authors find this level of normalization attractive
    because it appears to bring together many strings that are logically
    similar, but this level of normalization has limited utility in
    actual practice and has side effects that confuse users. This
    normalization step is presented for completeness, but it is not
    generally appropriate for use on the Web.

---

#1975: Link and create harmony between this doc and W3C document 
"charmod-norm"

W3C has a document whose short name (for historical reasons) is 
"charmod-norm" and whose title is "Character Model for the World Wide 
Web: String Matching". See: https://www.w3.org/TR/charmod-norm/. These 
documents have some similarity of content (there is also a similarity to 
PRECIS). It might be a good idea to cross-link this document and 
charmod-norm and ensure consistency when there is overlap.

---

#1976: Missing 'character encoding form'?

The Appendix A definition of terminology is a pretty good, but doesn't 
mention character encoding [form], which is the mapping from a code 
points in a character set to code units. This is actually the more 
commonly needed term.

Note too the opportunity to harmonize with I18N Glossary 
<https://w3c.github.io/i18n-glossary/#c>

---

#1977: Missing discussion of surrogates?

There is a some discussion of surrogates in the appendices, but no 
mention of them in the body of the document, especially near the ABNF. 
It's probably a good idea to at least mention their exclusion somewhere 
in Section 6.

---

#1978: Quirks in the history?

There are a variety of places where one could take issue with the 
"history of Unicode" in Appendix B. I don't see any technical issues and 
don't really want to suggest any alterations, since this version of 
history conveys all of the important technical details and leaves out or 
alters some things that probably only matter to historians. Making this 
issue to note that we didn't ignore it.

---

#1979: NFC and specifications

Appendix C discusses Unicode normalization and the NFC form. The focus 
is on implementations, but there probably should be a mention of 
specifications (that is, I-Ds and other IETF technical documents) here 
(as with charmod-norm). It is primarily name/value matching that is 
affected by potential non-normalization. Specifications need to require 
(or forbid!) it in matching/uniqueness algorithms without requiring 
implementations to do Early Uniform Normalization on the wire.

---

Thanks!

Best regards (for W3C I18N),

Addison

-- 
Addison Phillips
Chair (W3C Internationalization WG)

Internationalization is not a feature.
It is an architecture.

Received on Monday, 10 February 2025 22:27:37 UTC