- From: François Daoust via GitHub <sysbot+gh@w3.org>
- Date: Fri, 06 Nov 2015 15:24:13 +0000
- To: public-secondscreen@w3.org
tidoust has just created a new issue for https://github.com/w3c/presentation-api: == Possibility for a character to be interpreted differently depending on locale == Hi all, [I'm raising this as an issue for tracking purpose, but I do not think that there is any issue in the end, so actually suggests to close it once people have reviewed it, unless someone points out that I missed something, of course!]. The group discussed potential internationalization issues during its F2F last week [1]. One particular issue that was raised was the possibility for a given character in a JS string to be interpreted differently by both browsing contexts, meaning it would be represented by different glyphs with different meaning in a Japanese, Chinese and/or Korean environment (e.g. a glyph meaning "Yen" in Japanese and a different currency in Chinese). I said that I believed this was impossible in practice, and took an action "to investigate the possibility for a JS string to be rendered differently by different glyphs and locales". I had a quick chat with @r12a (Richard Ishida), internationalization activity lead at W3C, and confirm that, unless I missed something in the scenario presented below, such a problem should never ever happen. The Presentation API operates on JavaScript types which are not affected by the character encoding used to retrieve the content. The hypothetical scenario where a problem could have happened was something like: 1. the app running on the controlling browsing context is served encoded in Shift_JIS (usual encoding for Japanese characters); 2. the app running on the receiving browsing context is served encoded in Big5 (usual encoding for Chinese characters); 3. the app on the controlling browsing context extracts a string from a DOM element 4. the app on the controlling browsing context sends that string over to the receiving browsing context using the Presentation API's "send" method; 5. the app on the receiving browsing context sets the received value to a DOM element; 6. the characters rendered on the receiving browsing context for that DOM element mean something different. In particular, regardless of the encoding used to serve a page/app, extracting a string from a DOM element returns a DOMString [2], which is an UTF-16 encoded serialization of the underlying sequence of Unicode characters (in an ideal world, this would return the sequence of Unicode character codes directly, but JavaScript strings are 16-bits only, so some characters are actually represented as two 16-bits surrogate pairs). For instance, the Unicode character of a Japanese's "katakana letter small A" is 0x30A1, so if a DOM element contains such a letter, extracting it will yield a sequence with one integer 0x30A1, even if the document that was used to produce this element was encoded in Shift_JIS where this character is represented as a 0xA6 byte. >From the perspective of the Presentation API, the communication channel sends a DOMString to the other end point. The actual bytes sent over the channel depend on the transmission protocol: WebSocket will typically turn the DOMString into Unicode characters (thus creating what WebIDL calls a USVString) and encode the result using UTF-8 for transmission, while other protocols could do differently, e.g. Unicode character codes as 32-bit values. What is important is that the receiving end point will eventually see a DOMString, again to be interpreted as a UTF-16 encoded serialization of a sequence of Unicode characters, independently of the character encoding that was used to load the HTML content. What may of course happen in the katakana example is that the Chinese font used on the receiving browsing context does not contain the right glyph to represent a Katakana letter small A. The character would be rendered as an unknown one in that case (perhaps as a question mark or a square). This should never produce another character with a different meaning though! [1] http://www.w3.org/2015/10/29-webscreens-minutes.html#item07 [2] http://heycam.github.io/webidl/#idl-DOMString See https://github.com/w3c/presentation-api/issues/218
Received on Friday, 6 November 2015 15:24:18 UTC