Re: [webauthn] truncation to 64-byte upper limit doesn't mention character boundaries from =JeffH via GitHub on 2018-06-27 (public-webauthn@w3.org from June 2018)

From: =JeffH via GitHub <sysbot+gh@w3.org>
Date: Wed, 27 Jun 2018 23:55:57 +0000
To: public-webauthn@w3.org
Message-ID: <issue_comment.created-400866771-1530143756-sysbot+gh@w3.org>
[this issue is related to issue #593 and PR #951]

@aphillips wrote:
> Note that the specification does not require truncation on a Unicode character boundary

I was wondering whether/when you'd bring this up. 

I've done some modest research on this topic of "unicode string truncation" (due to the text you cite above) and apparently it is more complex than simply performing truncation on a Unicode character boundary -- it apparently ought to properly be done on [extended grapheme cluster](http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries) boundaries. 

I found detailed analysis here: https://hoytech.github.io/truncate-presentation/
..and a library: https://github.com/hoytech/Unicode-Truncate, but nothing regarding "unicode string truncation" in IETF, W3C, or Unicode specs :-/

In brief discussion a little while back, @stpeter suggested that specifying how to do proper "unicode string truncation" perhaps ought to be addressed by the unicode consortium. I'd imagine as an addition to [TR29](http://www.unicode.org/reports/tr29/) "Unicode Text Segmentation" (but who knows).  [Charmod](http://w3c.github.io/charmod-norm/) may want to say something about it.  

Rather than properly & thoroughly spec how to do "unicode string truncation" in webauthn, perhaps we should simply state something like (in addition to the above-quoted spec text):
      "**Such truncation SHOULD be performed on [extended grapheme cluster](http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries) boundaries [[!UAX29]].**"

..though, _**"who" does such truncation is a question**_. Presently the webauthn spec says it is the [authenticator](https://w3c.github.io/webauthn/#authenticator) who may perform such truncation, but requiring authenticators to be able to perform "unicode string truncation" on [extended grapheme cluster](http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries) (EGC) boundaries will be controversial I suspect. Rather, perhaps the [RP](https://w3c.github.io/webauthn/#relying-party) and/or [Client](https://w3c.github.io/webauthn/#webauthn-client) ought to do that?  Though, they do not know the capabilities of the authenticator, i.e., what string length it can accommodate, and it seems the present truncation language in the spec is attempting to allow for authenticators that are able to handle strings longer than 64 bytes? (further discussion at end, below)

> is there a reason you didn't use character count instead of byte count in the first place?)

Yes, because the impetus for this length restriction is in the context of a narrow-bandwidth channel (e.g., BLE or NFC) between the [webauthn client](https://w3c.github.io/webauthn/#webauthn-client) and an [authenticator](https://w3c.github.io/webauthn/#authenticator) ([illustration here](https://mdn.mozillademos.org/files/15801/MDN%20Webauthn%20Registration.png)), and also that these strings may be stored by the authenticator who may have limited resources.  At that level of abstraction, we're dealing in byte counts, not char counts (which for Unicode in UTF-8 might be several bytes long -- e.g., apparently there's a Tibetan character with 8 combining marks (I dunno offhand how many bytes in UTF-8 encoding that'd end up being)).

> also I think it doesn't mean to say 64-byte *minimum* length. I suspect it means to say "maximum" there

the text you're referring to is:
"Authenticators MUST accept and store a 64-byte minimum length for a name member’s value."

Yeah, I _think_ the perspective that was written from is: one MUST accommodate _**at least**_ a 64-byte length for this value.  I.e., some authenticators may, if presented with a 80 byte string, simply accommodate it. Alternatively, truncate it no shorter than 64 bytes (if doing truncation on arbitrary byte boundaries, which as we note, is not i18n-kosher). I.e., if the authenticator supports 70 byte name strings, it would ostensibly truncate an 80-byte string to 70 bytes. 

I'm thinking we need to add a Note: to the spec explaining this rationale (if I have it correct, or whatever the rationale is if I do not).







-- 
GitHub Notification of comment by equalsJeffH
Please view or discuss this issue at https://github.com/w3c/webauthn/issues/973#issuecomment-400866771 using your GitHub account
Received on Wednesday, 27 June 2018 23:56:00 UTC