- From: CSS Meeting Bot via GitHub <sysbot+gh@w3.org>
- Date: Wed, 19 Apr 2017 03:08:27 +0000
- To: public-css-archive@w3.org
The CSS Working Group just discussed Consider using USVString instead of DOMString, and agreed to the following resolutions: ``` RESOLVED: CSSOM can use either USVString or DOMString ``` <details><summary>The full IRC log of that discussion</summary> ``` <TabAtkins> Topic: Consider using USVString instead of DOMString <fantasai> ScribeNick: fantasai <fantasai> SimonSapin: In JS, strings are made of a sequence of 16-bit integers <fantasai> SimonSapin: Can be arbitrary sequence <fantasai> SimonSapin: Usually interpreted as UTF-16 <fantasai> SimonSapin: But don't have to be well-formed UTF-16 <fantasai> SimonSapin: In particular, range of values that are called surrogates <fantasai> SimonSapin: If you have a leading surrogate plus trailing surrogate, i.e. 2 UTF-16 ints, that forms a single Unicode codepoint <fantasai> SimonSapin: But in JS, nothing stops surrogates from appearing in the wrong order, or a single one by itself <fantasai> SimonSapin: This is invalid Unicode <fantasai> SimonSapin: But you can do it in JS <fantasai> SimonSapin: If you want to convert that string to UTF-8, UTF-8 is designed to exclude surrogate codepoints to align with set of valid UTF-16 strings <fantasai> SimonSapin: So not all JS strings can be represented in UTF-8 without losing data or using escaping mechanism <fantasai> SimonSapin: Wasn't an issue, because every browser internally uses same type of string as JS <dbaron> Github topic: https://github.com/w3c/csswg-drafts/issues/1217 <fantasai> SimonSapin: So if ou have CSSOM string that has unpaired surrogate, e.g. in an ident or content property string <fantasai> SimonSapin: it's ok <fantasai> SimonSapin: What's changing now is that in Firefox, we have a project called Stylo, which is to import Servo style system into Gecko <fantasai> SimonSapin: That style system is using Rust str type for all strings <fantasai> SimonSapin: which is based on UTF-8, so it cannot represent unpaired surrogates <fantasai> SimonSapin: what that means is in practice, whenever a string comes from CSSOM and goes into the style system in Servo and in the future in Firefox, we convert to UTF-8 and in that process, any unpaired surrogate is replaced with U+FFFD REPLACEMENT CHARACTER <fantasai> SimonSapin: So there is some data loss <fantasai> SimonSapin: However, I think this kind of situation only happens accidentally <fantasai> SimonSapin: Fact that JS strings are this way is not a feature, it's a historical accident <fantasai> SimonSapin: I don't think there is a real compat risk with shipping Firefox this way <fantasai> SimonSapin: Still, it's a deviation from current interoperable behavior, so wanted to bring it up <fantasai> Florian: Proposal? <fantasai> SimonSapin: In WebIDL which we use to define itnerfaces for JS, there is two string types. DOMString corresponds to JS strings with aribtrary 16-bit <fantasai> SimonSapin: There is USVString, Unicode Scalara Value String, which has no unpaired surrogates, only well-formed unicode <fantasai> SimonSapin: When you convert DOMString to that, you get the same behavior as in Servo, replacing lone surrogates with UFFFD <fantasai> SimonSapin: If we want to keep this interoperable, then I propse to use USVString for all of CSSOM <fantasai> ChrisL_: Seems like a good idea, since unpaired surrogates are only an error <fantasai> ChrisL_: Only used for binary data, and cna't imagine that in CSSOM <fantasai> TabAtkins: USVString is supposed to be avoided in WebIDL <fantasai> TabAtkins: Currently only used in networking protocols that use scalar values <fantasai> TabAtkins: Requires extra processing compared to UTF-16 strings <fremy> ChrisL_: maybe in custom properties though, people would want to store binary data; they should encode it to avoid syntax issues though so no big deal <fantasai> dbaron: Anne disagrees with advice in WebIDl spec, btw <tantek> s/Anne/Annevk <fantasai> dbaron: There's a github issue against WebIDL spec to give coherent advice, but ppl disagree on what that should be <fantasai> dino: Appreciate that you want to use rust string type, but we all have to use our own string types <fantasai> dino: Maybe resolution is all DOM strings should be that way <fantasai> TabAtkins: No, that would break a lot of things <fantasai> TabAtkins: ppl smuggle binary date in JS strings <fantasai> TabAtkins: But for things that talk text, coudl do it <fantasai> dino: Everything, not just CSSOM <fantasai> Florian: Would it be reasonable for implementations that don't do rust strings internally <fantasai> myles_: If we don't know perf impact, can't agree to do this <dbaron> myles_: so somebody somewhere has to try it first before we agree to it <fantasai> SimonSapin: Tab, ? <iank_> q+ <fantasai> TabAtkins: Some DOM Apis have to be 16-bit, e.g. Fetch ... <fantasai> rbyers: It's not in Chrome <fantasai> TabAtkins muses <SimonSapin> s/?/did you mean changing JS or DOM would break things?/ <fantasai> till: It's not entirely out of the question that it would be Web-compatible enough that we could change it in JS itself <fantasai> fantasai: For JS itself, Tab was saying it's not doable, but for DOM Apis more likely to be possible <fantasai> iank_: Need to check with architecture folks about this <fantasai> iank_: our architectue folks in charge of bindings and stirng types and stuff <fantasai> iank_: Looked for code where we switch to USVStrings, and that's very expensive for us it looks like <fantasai> iank_: Might be perf problems <fantasai> fantasai: My take is that this is a veyr weird edge case with no real use... lone surrogates in the CSSOM. <fantasai> fantasai: So I would say, let's spec you can use either, and we don't care. <fantasai> myles_: Every string would have to get transcoded, that's crazy <fantasai> TabAtkins: .... <fantasai> iank_: Would have to guarantee that htat block internally is clean <fantasai> TabAtkins: Move to UTF-8 clean internally <fantasai> iank_: Sounds non-triial <fantasai> Florian: Spec that either is Okay then it's not any work <fantasai> Florian: If we can't spec that, then it means web depends on it, so Servo will have to bite the bullet <fantasai> TabAtkins: I'm okay with doing that, put a note that we'd like to move th USVString <fantasai> shane: If there's a webb compat problem, then it's a problem <fantasai> TabAtkins: That means someone is injecting lone surrogates into the CSSOM. Can't come out of the parser <fantasai> TabAtkins: In that case probably buggy anyway <fantasai> shane: If ppl notice a problem, they'll file bugs <fantasai> eae: It's very hard to get into the situation except intentionally <fantasai> myles_: Does Servo have to translate between JS string and USVString all the time? <fantasai> SimonSapin: Yes <fantasai> SimonSapin: We have optimizations, e.g. if ascii then stord in one byte per uit, skip UTF-8 conversion <fantasai> s/USVString/UTF-8 String/ <fantasai> Florian: Can we just resolve on both and if it's a problem, come back and we'll change hte spec? <fantasai> fantasai: I think interop in this very very weird case is not worth any effort, so it should allow both <fantasai> till: It's not servo-specific, others might want UTF-8 codepaths <fantasai> myles_: I believe that you believe that. <fantasai> Rossen: Anyy objections? <fantasai> rbyers: We should rediscuss if we find web compat issues <fantasai> RESOLVED: CSSOM can use either USVString or DOMString <fantasai> fantasai: We can alwasy raise issues if they're found later. <fantasai> SimonSapin: This also affects other specs with WebIDL interfaces, e.g. CSS Fonts defines @font-face interfaces <fantasai> Florian: Should we define a CSSString? <fantasai> ... <fantasai> iank_: But if we do this later.. <fantasai> fantasai: We are literally deciding that you can do either, forever. Unless someone comes back and says "lone surrogates in CSSOm are an important use case and I need them" <fantasai> Rossen discusses agenda items ``` </details> -- GitHub Notification of comment by css-meeting-bot Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/1217#issuecomment-295053842 using your GitHub account
Received on Wednesday, 19 April 2017 03:08:35 UTC