Re: Dangers of non-UTF-8 Re: Details on internal encoding declarations

On May 23, 2008, at 3:02 PM, Henri Sivonen wrote:

> I am aware of this. The server cannot know if the user typed a  
> character or a string that looks like an NCR, so I think that is  
> dataloss in the strict sense.


That's true, but this data loss happens with UTF-8 documents, too -  
entering "Ô" and "т" in Google search field results in identical  
requests, despite Google start page being UTF-8.

As such, I'm not sure if it's a problem worth highlighting. While  
UTF-8 is a nice general purpose solution, it is has its downsides, and  
switching a Russian page from windows-1251 to UTF-8 often makes  
roughly as much sense as switching an English one to UTF-16.

- WBR, Alexey Proskuryakov

Received on Friday, 23 May 2008 11:23:22 UTC