Re: [whatwg/encoding] 0xA3 0xA0 in GB 18030 (Issue #338)

Some preliminary analysis:

## Usage statistics of GB 18030, GBK, and GB 2312 for websites

See:

* https://w3techs.com/technologies/details/en-gb18030
* https://w3techs.com/technologies/details/en-gbk
* https://w3techs.com/technologies/details/en-gb2312

## Bing search engine results

Searching [] (U+E5E5) in Bing China, most of the web pages were in 200x. Some websites may quote text from other websites, resulting in some recent results:


| Website                 | Results          | Latest year in the first page | Bing International |
|-------------------------|------------------|-------------------------------|--------------------|
|     sina.com.cn         |     1,730,000    |     2014                      |                    |
|     Sohu.com            |     778,000      |     2024                      |                    |
|     qq.com              |     78,700       |     2021                      |                    |
|     Sogou.com           |     67,900       |     2021                      |                    |
|     Zol.com.cn          |     42,300       |     2024                      |                    |
|     Jjwxc.net           |     39,000       |     2010                      |                    |
|     people.com.cn       |     27           |     2022                      |                    |
|     chinanews.com.cn    |     14           |                               |                    |
|     Pconline.com.cn     |     11           |     2024                      |                    |
|     china.com           |     10           |     2024                      |                    |
|     jd.com              |     7            |     2024                      |                    |
|     Hexun.com           |     3            |     2003                      |                    |
|     alipay.com          |     0            |                               |     75,000         |
|                         |                  |                               |                    |

## Examples

https://www.chinanews.com.cn/n/2004-01-13/26/391173.html (GB 2312)

<img width="454" alt="image" src="https://github.com/user-attachments/assets/f817b346-1c98-445a-89ca-80f3c0d096d4">

This website uses both U+3000 and U+E5E5.

https://blog.sina.com.cn/s/blog_44c67f2c0102v4at.html (UTF-8)

![image](https://github.com/user-attachments/assets/b56577f9-a76e-4072-b751-c9b9760c1db6)

This website uses both U+3000 and U+E5E5.

https://news.sina.com.cn/c/2004-01-21/09272687351.shtml (GB 2312)

<img width="435" alt="image" src="https://github.com/user-attachments/assets/1336497b-cdf5-4c4b-8594-957b3266a68f">

This website only uses U+3000.

https://edu.sina.com.cn/focus/wq3/index.html (GB2312, date 2008)

<img width="483" alt="image" src="https://github.com/user-attachments/assets/1890479c-c0ad-404a-b9eb-ef5134d4fc7f">

Although it's a U+E5E5 result, the source code only contains U+3000.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/338#issuecomment-2472292747
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/encoding/issues/338/2472292747@github.com>

Received on Wednesday, 13 November 2024 03:25:23 UTC