- From: Martin Duerst <duerst@w3.org>
- Date: Fri, 19 Nov 2004 15:11:03 +0900
- To: dc <deepak.rathore@gmail.com>, www-international@w3.org
At 14:10 04/11/19, dc wrote: > >hi all, > >In windows, non ascii data is treated as unicode( wide char) ucs-2/ >utf-16. for eg filename That's true for Windows NT/2000/XP, internally and for the 'wide-character' APIs. As far as I understand, it's not true internally, although I guess Windows 98 exposes filenames as UCS-2/UTF-16 for those 'wide-character' APIs available on that system. All MS Windows systems still expose file names in the local (often not really correctly called "ANSI") encoding for the old (bytestring-oriented) APIs. This shows up when running software written for both unix and windows using these APIs. A typical example would be a cvs client on Windows. >how does unix systems treat ??????? Unix treats them as bytes. It has no idea about what the encoding is. Each user/process can choose an encoding by setting (the encoding component) of a locale. In the old days, that was fine; everybody in Japan on Unix machines was using EUC-JP, and nobody else was seeing these file names. In a networked world, that's no longer the case at all, so this model doesn't really work anymore, but it's still in wide use. The tendency (although slow) today is to move towards using UTF-8 for encoding file names. This works quite well in many cases. But it needs concious decisions, setup, and a bit of user education. On Macs, I'm not sure what's used internally, but on the Unix side of Mac OS X, filenames are exposed as UTF-8. The problem on the Mac is with normalization; in most cases (as far as I understand except Korean), filenames are decomposed. Depending on the interface used, that may or may not show up, similar to how differences in case handling between Windows and Unix systems may or may not show up. Regards, Martin.
Received on Friday, 19 November 2004 13:39:52 UTC