- From: Addison Phillips via GitHub <sysbot+gh@w3.org>
- Date: Thu, 09 May 2019 17:37:18 +0000
- To: public-webauthn@w3.org
@agl said: > It might be the case that eliminating surrogate code points in UTF-8 is so obvious that it's redundant to specify it here, but it wasn't completely obvious to me. On the other hand, if one writes UTF-8 with surrogate code points to an authenticator, Chrome will not interoperate with it because it considered that to be invalid UTF-8. Thus the desire to highlight that aspect of the UTF-8 spec. I think the disconnect here is: a UTF-16 => UTF-8 transcoder will convert a surrogate pair into a 4-byte UTF-8 character. Anything that encodes the surrogates separately is non-compliant (and in practice these don't exist). The only source for isolated surrogates in UTF-8 would be unpaired surrogates in the data source (there are of course things that can cause this in UTF-16 encoded strings, such as splitting a string arbitrarily...) What I'd way to avoid is people writing lots of machinery to check for this state, when in practice their transcoder and the [[Encoding]] spec have already handled this. > > I don't know how to square replacing "any partial code point at the end with U+FFFD" with the fact that U+FFFD is a 3-byte sequence. > > This is platform behaviour in the face of a truncated encoding. Some decoding libraries will replace invalid code point encodings with U+FFFD. No, I understood that to be the case: if reading a UTF-8 (or UTF-16 fwiw) string that has dangling bytes (or other malformed sequences), the dangling bytes are replaced by the transcoder with U+FFFD. However, my comment was basically: I read this as saying to the *generator* (the one doing the truncating) to replace the character with U+FFFD. If one is trying to fit a byte length limit and only has 2 bytes left, you can't put a 3-byte sequence there :-) -- GitHub Notification of comment by aphillips Please view or discuss this issue at https://github.com/w3c/webauthn/pull/1205#issuecomment-490997484 using your GitHub account
Received on Thursday, 9 May 2019 17:37:20 UTC