Re: 有關 PTT 上的 big5

Hi all,

I am afraid that I am not capable or able to answer all of these questions.
But I will try my best.

在 2012年4月13日下午4:44,Kang-Hao (Kenny) Lu <kennyluck@csail.mit.edu> 寫道:
> 哈囉 挺宇
>
>
> Opera 的 Anne van Kesteren 最近在寫一份瀏覽器的編碼如何處理的規範[1]。
> Opera 的阿菲(Cc 列上的 Philip Jägenstedt)和 Anne 有提到或許把瀏覽器碰
> 到 <meta charset=big5> 的內容以 'big5-hkscs' 解碼這個可能性,但是由於
> 'big5-hkscs' 和「Unicode 補完計畫」(以下稱為 'big5-uao')不兼容,幾個台
> 灣的朋友在 W3C HTML5 中文興趣小組的郵件群上[2]表示不支持這個想法,其中
> PTT 是使用 'big5-uao' 的一大宗,因此有些問題想問你看看你有什麼想法,順便
> 希望你提供一些資料:

Using big5-uao on PTT are mainly due to some historical reasons.
The main protocol for accessing PTT is telnet, not web, and the most
widely used client is PCMan (http://pcman.openfoundry.org/), which has
built-in big5-uao support.

>
> 1. 阿菲想問問看你有沒有辦法提供「所有 'big5-hkscs' 和 'big5-uao' 解碼會
> 有差異的文章」[3]?
>
> 'big5-hkscs' 和 'big5-uao' 的差異在[4]有表格。不過如果弄出「所有」太困難
> 的話,這裡是一些有差異的例子,想請你幫忙看看能不能全文搜出在哪些版會出現
> 這些字節串:
>

I am not able to do this since the system is crowded; we will have
performance issues if we do such full-text search and decoding
comparison.

> == 日文 ==
>
> U+6075 恵(釘宮理恵)
>
> big5-uao: \x92\xa8
> big5-hkscs: \x93\x7a
>
> U+54b2 咲(天才麻將少女)
>
> big5-uao: \x94\x46
> big5-hkscs: \x83\x5a
>
> U+5b9f 実(真実)
>
> big5-uao: \x92\xd4
> big5-hkscs: \x89\x63
>
> == 港文 ==
>
> U+560b 嘅
>
> big5-uao: \xa0\x41
> big5-hkscs: \x9d\xef
>
> U+7740 着
>
> big5-uao: \x95\x4d
> big5-hkscs: \xfe\xd3
>
>
> 這裡主要也是想知道 PTT 裡是不是有人用 'big5-hkscs',特別是香港相關的看板。
>
>
> 2. 阿菲想問你對於「http://www.ptt.cc/ 上的日文字在除了 Firefox 以外不能
> 顯示」這件事是怎麼想的?

I am not sure what Firefox is doing. It seems to support some part of uao.
We have plans to switch the web interface to UTF-8, so the above issue
shall not exist.

>
> ggm 跟我講 "ssh bbsu@ptt.cc" 有 UTF-8 的版本,所以 http://www.ptt.cc 要
> 改成 UTF-8 是不是很有可能?為什麼不這樣做?

As mentioned above, there are plans to switch to UTF-8, but there are
some work yet to be done.
First, the "bbsu" mode is simply a translation table (big5-uao to
UTF-8) put in front of the I/O of the program, and is done by solely
one person. The web interface is another project, so these two work
have to be merged.

>
> (我注意到 Google 搜尋這些頁面都會出現方塊字,例如:[5]。所以不管這個問
> 題最後怎麼樣,把這個改了是不是比較好?)

I agree with you, but it will take time to fix.

>
>
> 3. 我想問你你對就「瀏覽器碰到 <meta charset=big5> 該怎麼處理」這個問題怎
> 麼想的?

The browser should use the most standard big5, CNS11643, and may
notify user if the document is suspected to use other character set
(upon the detection of decoding error).

>
> 4. 你有有多少台灣人有裝 Unicode 補完的概念嗎?(不包括 BBS 客戶端的內建
> 轉換表)可不可以順便請你對「Unicode 補完」這個詞在 PTT 上做全文搜索了解
> 一下現在的趨勢 :p

I think the installation is very little. Most websites are using UTF-8
now, and users want things to work out-of-the-box.
I searched Google with "Unicode 補完 site:ptt.cc", and found many old
posts. Thus, I will say that there is nearly nobody installing it
manually.

>
>
> p.s. 感謝 ggm 的介紹
>
> p.s.2 這封郵件會存檔在[6]上,不嫌棄的話平常也可以參與一下 Web 標準相關討
> 論,或是一起來翻譯一下規範[7]。 :)
>
> p.s.3 阿菲會看也會寫中文,不過你要寫英文也可以。
>
> [1]
> http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#legacy-multi-byte-chinese-%28traditional%29-encodings
> [2]
> http://lists.w3.org/Archives/Public/public-html-ig-zh/2012Apr/thread#msg1
> [3] 參考對話紀錄 http://krijnhoetmer.nl/irc-logs/whatwg/20120412#l-569
> [4] http://moztw.org/docs/big5/
> [5]
> https://www.google.com/webhp?hl=zh-tw#hl=zh-TW&site=webhp&q=ptt.cc+C_Chat+%E5%8F%B0%E6%B9%BE%E4%BA%BA&oq=ptt.cc+C_Chat+%E5%8F%B0%E6%B9%BE%E4%BA%BA&aq=f&aqi=&aql=&gs_l=serp.3...10350l12390l4l12884l12l12l0l0l0l0l441l1276l5j6j4-1l12l0.frgbld.&bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&fp=125eff699114053d&biw=1258&bih=661
> [6] http://lists.w3.org/Archives/Public/public-html-ig-zh/2012Apr/thread
> [7] http://www.w3.org/html/ig/zh/wiki/Translation
>
>
> 此致
>
> Kenny


Robert Wang

Received on Friday, 27 April 2012 10:32:54 UTC