- From: Tex Texin <tex@i18nguy.com>
- Date: Fri, 19 Nov 2004 13:56:14 -0800
- To: Russ Rolfe <rrolfe@windows.microsoft.com>
- CC: www-international@w3.org
Russ, For non-unicode encodings in general, I use legacy. If I wanted to refer to specifically the Windows encodings, I use "native Windows encodings" instead of ansi. It's been a long time since I needed to also cover the OEM code pages, so there isn't confusion as to whether those are included or not by "native Windows". Since Unicode is native to NT/XP, one could argue that it is included by the term, but usually the discussion is Unicode vs. native Windows encodings, so the meaning is clear. (At least to me!) Despite their origins elsewhere, since Microsoft has customized most of its encodings for its own needs, (as well as giving them their own names) you might describe them as Microsoft-defined or Microsoft-originated encodings. That would clearly not include Unicode, and be a set of just the ones your company uses. hth tex Russ Rolfe wrote: > > At 10:11 pm 04/11/18, Martin wrote: > > > > All MS Windows systems still expose file names in the local (often > not really > > correctly called "ANSI") encoding ... > > Just curious, what the rest of you use for a generic term for the "ANSI" > encodings. > > Russ > > -----Original Message----- > From: www-international-request@w3.org > [mailto:www-international-request@w3.org] On Behalf Of Martin Duerst > Sent: Thursday, November 18, 2004 10:11 PM > To: dc; www-international@w3.org > Subject: Re: how does unix/linux treats non ascii data internally > > At 14:10 04/11/19, dc wrote: > > > >hi all, > > > >In windows, non ascii data is treated as unicode( wide char) ucs-2/ > >utf-16. for eg filename > > That's true for Windows NT/2000/XP, internally and for the > 'wide-character' APIs. As far as I understand, it's not true internally, > although I guess Windows 98 exposes filenames as UCS-2/UTF-16 for those > 'wide-character' APIs available on that system. > > All MS Windows systems still expose file names in the local (often not > really correctly called "ANSI") encoding for the old > (bytestring-oriented) APIs. This shows up when running software written > for both unix and windows using these APIs. > A typical example would be a cvs client on Windows. > > >how does unix systems treat ??????? > > Unix treats them as bytes. It has no idea about what the encoding is. > Each user/process can choose an encoding by setting (the encoding > component) of a locale. In the old days, that was fine; everybody in > Japan on Unix machines was using EUC-JP, and nobody else was seeing > these file names. In a networked world, that's no longer the case at > all, so this model doesn't really work anymore, but it's still in wide > use. The tendency (although slow) today is to move towards using UTF-8 > for encoding file names. This works quite well in many cases. But it > needs concious decisions, setup, and a bit of user education. > > On Macs, I'm not sure what's used internally, but on the Unix side of > Mac OS X, filenames are exposed as UTF-8. > The problem on the Mac is with normalization; in most cases (as far as I > understand except Korean), filenames are decomposed. Depending on the > interface used, that may or may not show up, similar to how differences > in case handling between Windows and Unix systems may or may not show > up. > > Regards, Martin. -- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
Received on Friday, 19 November 2004 21:56:20 UTC