- From: Giovanni Campagna <scampa.giovanni@gmail.com>
- Date: Sat, 31 Oct 2009 14:54:01 +0100
- To: Doug Schepers <schepers@w3.org>
- Cc: www-dom@w3.org
2009/10/30 Doug Schepers <schepers@w3.org>: > Hi, Laurens- > > Laurens Holst wrote (on 10/30/09 7:02 AM): >> >> Op 30-10-2009 10:32, Maciej Stachowiak schreef: >>> >>> "\uxxxx" is not a syntax, it is a Unicode string of the actual >>> character. \u introduces the escape sequence for a unicode code point. >>> So you can compare it directly to a character. >> >> Now Iām confused. The way Doug phrased it, \uxxxx *will* be syntax, i.e. >> the string "U+xxxx" will be replaced by "\\uxxxx" (a 6-character string >> containing an identifier). Not "\uxxxx" (a 1-character string containing >> the actual character) which could be compared directly to a character. >> >> Otherwise, I would suggest not to talk in terms of "\uxxxx" strings at >> all, after all the DOM specification does not need to concern itself >> with serialisation, but instead to just talk about characters and code >> points. > > > Just to clarify, are you objecting to the loose way I phrased it in my > email, or did you review the spec and find problems there? I may have used > the wrong terminology in the email, but the spec is the definitive source > that needs to get it right. > > So, please clarify if you object to the change described in the spec. Section 6.2.6, point 1.1.1: it says to use the "character value", which is defined as the Unicode character escape, that is the string consisting of U+005C REVERSE SOLIDUS, U+0075 LATIN SMALL LETTER U and four characters in the range U+0061-0066 LATIN SMALL LETTER A to LATIN SMALL LETTER F and U+0030-0039 DIGIT ZERO to DIGIT NINE, which interpreted as hexadecimal can be transformed into an Unicode scalar value / Unicode codepoint (meaning of the Unicode standard, not of the DOM3Events spec). Point 1.1.2 instead says to use the "Unicode code point" and encode it with "\u" (again U+005C REVERSE SOLIDUS, U+0075 LATIN SMALL LETTER U) followed by 4 or more hexadecimal digits. Section 2.6.7 finally says to use the most-author friendly between the "character value" and the "key name". So: 1) What is the difference between 6.2.6, point 1.1.1 and the following? Both result in a key value like "\u0041" for A U+0041 LATIN CAPITAL LETTER A (since that is the "character value" for that key in 6.2.7, and that is the encoded Unicode code point as for the Unicode standard) 2) How do you choose the most author friendly? I think that "A" is a better name for U+0041 LATIN CAPITAL LETTER A, but spec says to prefer always the character value, that is "\u0041" 3) Why both the examples and some earlier poster affirmed that instead the value key is a single character string containing the character itself? That is, given event instance of KeyboardEvent, event.key in Javascript returns for a key with label "+ =", in the US layout, shift pressed - "\u002B", the Unicode code point / character value, represented as an escaped string this means that key.length===6 and key.charAt(0) === "\" also, String(key).length===6 and JSON.parse(key) throws, that is, there is no good method to get a "+" from that (unicode character escapes are resolved at the parser layer, they don't leak inside strings) - "Plus", the key name from the database this means that you need convertKeyValue to get the plus, but probably it is more recognizable or easier to remember than "\u002B" - "+", the Unicode character this means that key.length===1 and key.charAt(0) ==="+" this is probably the better way for an author, although is not what the spec currently says for a key with label "% 5", in the US-international layout, AltGr pressed - "\u20AC" (key.length===6 and key.charAt(0)==="\") - "Euro" - "ā¬" and for an imaginary key corresponding to U+10000 LINEAR B SYLLABLE B008 A - "\uD800\uDC00" (key.length===12 and key.charAt(0)==="\") - "\u10000" (key.length===7 and key.charAt(0) ==="\") - "LinearBSyllableB008A" - "š" (key.length===2 and key.charAt(0) ===String.fromCharCode(55296) ) Moreover, given the same events in C++, assuming that WebIDL for C++ said: "use char* for DOMString", what you get? - "\u002B","\u20AC" and "\uD800\uDC00", meaning that strlen(key)==6 (==12 for U+10000) and *key == 92 - "\u002B", "\u20AC" and "\u10000", meaning that strlen(key) is either 6 or 7 and *key==92 - "Plus", "Euro", "LinearBSyllableB008A" - "+" ( strlen(key)==1 and *key == 43 ) "ā¬" ( strlen(key)==3 and *key == 226 ) and "š" ( strlen(key)==4 and *key==240 ) > Regards- > -Doug Schepers > W3C Team Contact, SVG and WebApps WGs > > Giovanni
Received on Saturday, 31 October 2009 13:54:46 UTC