- From: Philip Jägenstedt <philipj@opera.com>
- Date: Sun, 22 Apr 2012 11:05:33 +0200
- To: "Yuan Chao" <yuanchao@gmail.com>
- Cc: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>, "Chinese HTML Interest Group" <public-html-ig-zh@w3.org>
On Sun, 22 Apr 2012 02:23:12 +0200, Yuan Chao <yuanchao@gmail.com> wrote: > On Sun, Apr 22, 2012 at 1:07 AM, Philip Jägenstedt <philipj@opera.com> > wrote: > >>> Unlike ISO-2022-JP which has a very clear states definition, Big5 has >>> no error handling at all. (Just recall that Kenny was asking about >>> this about a year ago on this ML.) A visible character is very useful >>> instead of a fullwidth space, which just hides things away. > >> <http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#big5> >> defines the >> error handling. However, it can probably be improved, see >> <https://www.w3.org/Bugs/Public/show_bug.cgi?id=16771>. > Wondering how this definition comes? Anne specified something similar to other multi-byte encodings, I think. One main goal is to not consume following ASCII characters after an error, but as you can see the current solution can sometimes instead break a Chinese character following an error. > http://lists.w3.org/Archives/Public/public-html-ig-zh/2011Aug/0052.html > I didn't see any reply to Kenny's request since. I didn't see that at the time, but it seems like the new spec should address this. I suggest discussing the error handling of Big5 in the bug I filed, input from someone with more experience would be helpful. > For people starts using big5 since the DOS era, one should be used to > the garbled characters due to conflicts with (ext.) ASCII control > codes and tables. This is the "feature" of big5. hahaha... Also a good > "error message". > >> How U+FFFD is rendered appears to be a font issue, I presume you don't >> mean >> that random incorrect characters is preferable. > The current solution seems to take all PAU as error. I don't prefer it. The mapping in the spec doesn't use any PUA code points, are you suggesting that it should? >>>>>> On Wed, 18 Apr 2012 22:05:22 +0200, Kang-Hao (Kenny) Lu >>>>>>> 提供一點考古方向:有些的編碼看起來是 big5-2003[1]、、、、、囧 >>>>>>> 6. http://domestic.mytour.com.tw/list.asp?id=721 >>>>>>> hkscs: 不捨結束此行精采假期、踏上歸途<U+FFFD �>視情況休息<br>18:30~ >>>>>>> uao: 不捨結束此行精采假期、踏上歸途<U+8FF3 >>>>>>> 迳>視情況休息<br>18:30~ >>>>>>> >>>>>>> 84B3 在 big5-2003 是 U+F0E0(PUA),在 Windows 上看起來是 U+2192(→ >>>>>>> RIGHTWARDS ARROW),但是兩個字形(glyph)並不一樣。 >>>>>> >>>>>> >>>>>> 有可能,不過<U+3001 IDEOGRAPHIC COMMA 、>或者<U+FF0C FULLWIDTH >>>>>> COMMA ,>好像更好。 >>>>> >>>>> I would tend to "→" here. (as supply info, we don't use comma as >>>>> parentheses) > >>>> It's mostly <http://www.wintan.com.tw/service_06_08.htm> that made me >>> Oh. For this example, it's even more obvious that "→" makes sense. >>> It tells you to look in to the menu bar for [證券帳務] menu item and >>> *then* click on [庫存查詢] sub-menu. A "、" makes no sense at all! > >> In >> <http://lists.w3.org/Archives/Public/public-html-ig-zh/2012Apr/0044.html> >> you said that "、" was very likely, but if you're sure it should be "→" >> then >> it looks like all 84B3 might be the same, which seems a lot saner. > That's before Kenny's "interpretation". Don't you agree "→" makes more > sense here? As I said, I'm neutral and support for the best. Yes, you are probably right, I didn't actually read the content when guessing. >>>>>>> 我不知道該說什麼才好了,感覺為 Big5-UAO 把 big5-2003 >>>>>>> 的東西加回去一些可 >>>>>>> 以解決很大部份,另外,上面這些字都不是日文漢字,所以也不影響我對 >>>>>>> Big5- >>> I tend to agree with Kenny's view here. >> One of you will have to explain exactly what should be done, how should >> Firefox's mappings be modified to make better sense? > I think you understand how community does things. We can try to bring > up this and call for people's help. Do you know of other places than this list where it would be helpful to ask about these issues? >>>>>>> UAO 的要求 >>>>>>> :p,有人知道這部份的編碼對應是在可以動手術的範圍還是不行? >>>>>> 按照上面的,用Big5-2003並不是很完美的。MozTW的映射好像不是完全可靠,所以我不知道該根據什麼去定義Big5-UAO。 > >>>>>> 問題的範圍畢竟是0.043%的臺灣網頁的幾個字符。現代的瀏覽器只有Firefox能顯示,而且他們的映射還造成別的問題…… >>> Unfortunately it cause some problem for non-native Chinese readers. :) >> Certainly it's a problem for all readers of Chinese that random >> characters >> show up where they don't belong? > Emm... Here you think the current firefox solution is not perfect and > the needs in Taiwan is negligible so it's better to use big5-hkscs to > replace the big5 (seems to be CP950?)? I'm an experimental high energy > physicist. The best way to resolve a debating and validate a theory is > to do experiment and measure it. :) Maybe you can just implement it in > Opera and make a survey to see how both HK and Taiwan users appreciate > it? It will definitely be an improvement for Opera since HKSCS will start working and UAO has never worked, but if there's something even better we could do I'd really prefer that. A better test would be to see the reactions if Firefox changed, but that's not an experiment I can run :) >>>>>> 在這種情況下,我覺得嘗試跟受影響的網站聯繫還是有希望。反正這是唯一的辦法能夠讓香港和國際的用戶也看得到。 > Still as mentioned, HK users overwrite "big5-hkscs" as "big5". It's > their government's choice to create the inconvenience to "encourage" > people to move to unicode. > > http://my.opera.com/community/forums/topic.dml?id=191245 > > It took quite long time for Yahoo! Taiwan to move to unicode. Pushing > big5-hkscs to replace big5 in w3c would have profound effect. I only > ask for not breaking my current usage. Though I'd be happy to help to > put the major variants of big5 to w3c. (it's very little info here > http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#big5) Which variants do you think should be specified and what should trigger them? Am I correct to assume that Firefox is the only current browser that *doesn't* break your current usage? -- Philip Jägenstedt Core Developer Opera Software
Received on Sunday, 22 April 2012 09:06:15 UTC