Re: [webauthn] truncation to 64-byte upper limit doesn't mention character boundaries

I chatted with @agl about this recently.

Given that we merged PR #951, we already have appropriate entities enforcing PRECIS on the name-ish strings.   

A thing to note about the "authnrs MAY truncate srings to at least 64 bytes" statement in the spec is that authnrs CAN support/handle/return (UTF-8) strings longer than 64 bytes, it is authnr-specific.  Thus the webauthn client cannot really be given the responsibility for truncating these strings, it needs to be left up to the authnrs. 

Given that strictly byte-level string truncation can mangle UTF-8 strings, see https://github.com/w3c/webauthn/issues/973#issuecomment-404382376, any truncation really SHOULD be done on _at least_ the code point level, and if possible, on the EGC (extended grapheme cluster) level. The latter is what @asmusf related, the former @aphillips in https://github.com/w3c/webauthn/issues/973#issuecomment-401099955.

As @agl implies in https://github.com/w3c/webauthn/issues/973#issuecomment-404607021, and clarified in our chat, Chrome will reject CBOR-encoded objects containing "text" strings that are not UTF-8 valid (which is detectable). So if an 



Below is a two-part proposal for what to add to the spec to address this issue.  If they are both nominally acceptable, then perhaps we select one based on whether we are willing to add additional normative language or not:

OLD:

Authenticators MUST accept and store a 64-byte minimum length for a `name` member’s value. Authenticators MAY truncate a `name` member’s value to a length equal to or greater than 64 bytes.

1. NEW, if we are willing to add additional normative language: 

Authenticators MUST accept and store at least a 64-byte length for a `name` member’s value. Authenticators MAY truncate a `name` member’s value to a length equal to or greater than 64 bytes.  Authenticators SHOULD perform any UTF-8 encoded string truncation on a code point boundary, and MAY perform such a truncation on a extended grapheme cluster (EGC) boundary [[!UAX29]]. Truncated strings SHOULD include an indication of truncation, such as appending an ellipsis. 

Note: Truncation of a UTF-8 encoded string at an arbitrary byte boundary, or even in some cases on an arbitrary code point boundary, may result in a string that cannot be properly rendered, or may look like a different character string if rendered. Truncation on code point boundaries is preferred over arbitrary byte boundaries. Truncation on EGC boundaries is the safest approach.


2. NEW, no new normative language:

Authenticators MUST accept and store at least a 64-byte length for a `name` member’s value. Authenticators MAY truncate a `name` member’s value to a length equal to or greater than 64 bytes.  

Note: Authenticators should perform any UTF-8 encoded string truncation on a code point boundary, and may perform such a truncation on a extended grapheme cluster (EGC) boundary [[!UAX29]]. Truncated strings should include an indication of truncation, such as appending an ellipsis. Truncation of a UTF-8 encoded string at an arbitrary byte boundary, or even in some cases on an arbitrary code point boundary, may result in a string that cannot be properly rendered, or may look like a different character string if rendered. Truncation on code point boundaries is preferred over arbitrary byte boundaries. Truncation on EGC boundaries is the safest approach.










-- 
GitHub Notification of comment by equalsJeffH
Please view or discuss this issue at https://github.com/w3c/webauthn/issues/973#issuecomment-406038881 using your GitHub account

Received on Wednesday, 18 July 2018 18:59:09 UTC