[Bug 28661] U+2212 in shift_jis encoder

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28661

--- Comment #19 from Jungshik Shin <jshin@chromium.org> ---
(In reply to Jungshik Shin from comment #14)
> (In reply to Masatoshi Kimura from comment #13)
> > Created attachment 1607 [details]
> > encoding-only mappings found on IE 11
> > 
> > Most entries are bogus, but IE has one encoding-only (U+00A5 to 0x5c) for
> > Japanese encodings and fullwidth-to-halfwidth mappings for ISO-2022-JP.
> 
> The current encoding spec (and Chrome's Shift_JIS) has two one-way mapping
> (fromUnicode):
> 
> If code point is U+00A5, return byte 0x5C.
> 
> If code point is U+203E, return byte 0x7E.
> 
> ICU's default Shift_JIS (ibm-943) has 47 encoding-only mappings. Most of
> them are Kanjis, but several of them are various symbols/punctuations like
> wave dash (two of them are U+00A5 and U+203E)


In addition to the above two one-way mappings in the current encoding spec,
ICU's default Shift_JIS has the following one-way mapping in the fromUnicode
direction (those with '|1'). I'm excluding all the entries for Kanjis (about 40
of them). 

<UFF5E> \x81\x60 |0
<U301C> \x81\x60 |1
<U2225> \x81\x61 |0
<U2016> \x81\x61 |1
<UFF0D> \x81\x7C |0
<U2212> \x81\x7C |1
<U2116> \x87\x82 |0
<UF86F> \x87\x82 |1
<UFFE4> \xFA\x55 |0
<U00A6> \xFA\x55 |1

The above list is a subset of what's listed in comment 17 for Safari's
Shift_JIS. I don't know what webkit is doing. (they use ICU's default converter
on Mac OS X/iOS, but hard-code some additional mappings to Webkit if they find
it necessary)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Received on Friday, 29 May 2015 18:00:59 UTC