[i18n-activity] String Handling section nits (#989)

aphillips has just created a new issue for https://github.com/w3c/i18n-activity:

== String Handling section nits ==
6.4. String Handling
https://w3c.github.io/webauthn/#sctn-strings

> (too long to quote)

This section describes byte-count truncation, including considerations for both code point and grapheme cluster based truncation. This is nicely written and the illustration is very helpful.

There are some potential infelicities in this chunk:

> Conforming User Agents are responsible for ensuring that the authenticator behaviour observed by Relying Parties conforms to this specification with respect to string handling. For example, if an authenticator is known to behave incorrectly when asked to store large strings, the user agent SHOULD perform the truncation for it in order to maintain the model from the point of view of the Relying Party. User-agents that do this SHOULD truncate at grapheme clusters.

* Consider changing "truncate at grapheme clusters" to "truncate at grapheme cluster boundaries" or "truncate on..."

> Truncation based on UTF-8 sequences alone may cause a grapheme cluster to be truncated, but still valid [UTR29]. This could make the grapheme cluster render as a different valid glyph instead of removing the glyph entirely.

* The first sentence is a little unclear, since the term "valid" (or is it "valid UTR29"??) doesn't really mean anything. A few things that are worth noting here are:

  * Some sequences, such as those that use ZWJ, might end up with a dangling joiner which interacts strangely with surrounding text.
  * While the example is nicely done, the visible effect is more pronounced in some languages, such as Indic scripts (where truncating a conjunct can change the appearance and meaning much more profoundly).

I18N should maybe consider some revisions to our text about truncation in [SPECDEV](https://www.w3.org/TR/international-specs/#char_truncation), including providing more details not germane to Webauthn so that the material can be referenced in this section (and in other specs with similar issues in the future).

> In addition to that, truncating on byte boundaries alone causes a known issue that user agents should be aware of: if the authenticator is using [FIDO-CTAP] then future messages from the authenticator may contain invalid CBOR since the value is typed as a CBOR string and thus is required to be valid UTF-8. User agents are tasked with handling this to avoid burdening authenticators with understanding character encodings and Unicode character properties. Thus, when dealing with authenticators, user agents SHOULD:
> 1.    Ensure that any strings sent to authenticators are validly encoded.
> 2.    Handle the case where strings have been truncated resulting in an invalid encoding. For example, any partial code point at the end may be dropped or replaced with U+FFFD.

* It's a little thing, but replacing a "partial code point" with U+FFFD means replacing a byte sequence that is 1, 2, or 3 bytes long with a 3 byte long sequence in UTF-8 (`0xEF.BF.BD`), that is, doing this operation may result in a DOMString whose UTF-8 representation is greater than the limit originally being imposed. As long as this isn't a problem, that's fine, but maybe worth calling out.

---
Instructions: 

Follow the process at https://w3c.github.io/i18n-activity/guidelines/review-instructions.html

1. **CREATE A PROPOSED REVIEW COMMENT BY REPLACING THE PROMPTS ABOVE THIS PARAGRAPH, BUT LEAVE THIS PARAGRAPH INTACT AS WELL AS THE TEXT BELOW IT** Then ask the i18n WG to review your comment.

2. After discussion with the i18n WG, raise this issue to the WG that owns the spec. Use the text above this para as the basis for that comment.

3. Replace the text 'link_to_issue_raised' below with a link to the place you raised the issue. Do NOT remove the initial '§ '.

4.  Edit this issue to remove this paragraph and ALL THE TEXT ABOVE IT. 



**This is a tracker issue.** Only discuss things here if they are i18n WG internal meta-discussions about the issue. **Contribute to the actual discussion at the following link:**


§ link_to_issue_raised


Please view or discuss this issue at https://github.com/w3c/i18n-activity/issues/989 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 9 November 2020 19:34:23 UTC