- From: dc <deepak.rathore@gmail.com>
- Date: Mon, 22 Nov 2004 10:48:28 +0530
- To: Martin Duerst <duerst@w3.org>
- Cc: www-international@w3.org
Thanks Martin. yes, only windows built on nt 4.0 technology supports unicode internally. not win95,98 wrt to unix, according to u whatever i have experenced , non ascii data is treated as byte by byte. But in hp site i found the following; so got really confused about unix : TRU64 UNIX Characters are processed internally using a 32-bit wchar_t data type http://h30097.www3.hp.com/unix/i18n.htm#single Any ideas on this or only unix flavour tru64 treats data as wide char...... Thanks DC On Fri, 19 Nov 2004 15:11:03 +0900, Martin Duerst <duerst@w3.org> wrote: > At 14:10 04/11/19, dc wrote: > > > >hi all, > > > >In windows, non ascii data is treated as unicode( wide char) ucs-2/ > >utf-16. for eg filename > > That's true for Windows NT/2000/XP, internally and for the > 'wide-character' APIs. As far as I understand, it's not true > internally, although I guess Windows 98 exposes > filenames as UCS-2/UTF-16 for those 'wide-character' APIs > available on that system. > > All MS Windows systems still expose file names in the local > (often not really correctly called "ANSI") encoding for the > old (bytestring-oriented) APIs. This shows up when running > software written for both unix and windows using these APIs. > A typical example would be a cvs client on Windows. > > >how does unix systems treat ??????? > > Unix treats them as bytes. It has no idea about what the > encoding is. Each user/process can choose an encoding by > setting (the encoding component) of a locale. In the old > days, that was fine; everybody in Japan on Unix machines > was using EUC-JP, and nobody else was seeing these file > names. In a networked world, that's no longer the case at > all, so this model doesn't really work anymore, but it's > still in wide use. The tendency (although slow) today is > to move towards using UTF-8 for encoding file names. This > works quite well in many cases. But it needs concious > decisions, setup, and a bit of user education. > > On Macs, I'm not sure what's used internally, but on the > Unix side of Mac OS X, filenames are exposed as UTF-8. > The problem on the Mac is with normalization; in most > cases (as far as I understand except Korean), filenames > are decomposed. Depending on the interface used, that > may or may not show up, similar to how differences in > case handling between Windows and Unix systems may or > may not show up. > > Regards, Martin. > >
Received on Monday, 22 November 2004 05:18:29 UTC