Re: [whatwg/encoding] Consider adding TextEncoder.containsLoneSurrogates() static (#174)

> otherwise you're risking silent data loss that is not trivial to debug

I think the notion of "data loss" is overblown. Unpaired surrogates don't have any semantics, so in that sense replacing an unpaired surrogate doesn't lose any meaning from the string.

However, it is imaginable that it could be harmful for the equality relation of (erroneous) strings to differ on different sides of a boundary, so that two strings compare unequal in JS but equal in Wasm. See a [similar-themed but different Chrome bug](https://bugs.chromium.org/p/chromium/issues/detail?id=662822#c13) where the bug was caused by two UTF-8 decoders generating a different number of U+FFFDs for erroneous byte sequences.

Round-tripping equality is the main issue for being able to round-trip file paths on Windows in Rust. The Web Platform is somewhat different in the sense that there aren't similar places where the platform gives you unpaired surrogates in a non-bug way (either on its own or giving you unpaired surrogates persisted by someone else) and expects their equality to be preserved. Specification-wise the Web Platform tries hard not to hand you unpaired surrogates. However, if you yourself create unpaired surrogates, the Web Platform most of the time preserves them for the duration of the execution of a JS program.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/174#issuecomment-481123361

Received on Tuesday, 9 April 2019 06:42:50 UTC