Re: [webauthn] Add considerations for string truncation. (#1205)

@agl said:

> It might be the case that eliminating surrogate code points in UTF-8 is so obvious that it's redundant to specify it here, but it wasn't completely obvious to me. On the other hand, if one writes UTF-8 with surrogate code points to an authenticator, Chrome will not interoperate with it because it considered that to be invalid UTF-8. Thus the desire to highlight that aspect of the UTF-8 spec.

I think the disconnect here is: a UTF-16 => UTF-8 transcoder will convert a surrogate pair into a 4-byte UTF-8 character. Anything that encodes the surrogates separately is non-compliant (and in practice these don't exist). The only source for isolated surrogates in UTF-8 would be unpaired surrogates in the data source (there are of course things that can cause this in UTF-16 encoded strings, such as splitting a string arbitrarily...) What I'd way to avoid is people writing lots of machinery to check for this state, when in practice their transcoder and the [[Encoding]] spec have already handled this.

> >    I don't know how to square replacing "any partial code point at the end with U+FFFD" with the fact that U+FFFD is a 3-byte sequence.
>
> This is platform behaviour in the face of a truncated encoding. Some decoding libraries will replace invalid code point encodings with U+FFFD.

No, I understood that to be the case: if reading a UTF-8 (or UTF-16 fwiw) string that has dangling bytes (or other malformed sequences), the dangling bytes are replaced by the transcoder with U+FFFD. However, my comment was basically: I read this as saying to the *generator* (the one doing the truncating) to replace the character with U+FFFD. If one is trying to fit a byte length limit and only has 2 bytes left, you can't put a 3-byte sequence there :-)

-- 
GitHub Notification of comment by aphillips
Please view or discuss this issue at https://github.com/w3c/webauthn/pull/1205#issuecomment-490997484 using your GitHub account

Received on Thursday, 9 May 2019 17:37:20 UTC