- From: Nico Williams <nico@cryptonector.com>
- Date: Tue, 9 Dec 2014 20:23:13 -0600
- To: John Cowan <cowan@mercury.ccil.org>
- Cc: "Phillips, Addison" <addison@lab126.com>, Matthew Kerwin <matthew@kerwin.net.au>, IETF Apps Discuss <apps-discuss@ietf.org>, "uri@w3.org" <uri@w3.org>
On Tue, Dec 09, 2014 at 09:07:59PM -0500, John Cowan wrote: > Phillips, Addison scripsit: > > These are both in UTF-8, are visually indistinguishable, and are > > identical under NFC, but fopen() cares which bag of bytes you grab. > > The same is true on Windows, where filenames are 16-bit code units rather > than 8-bit code units. In general, we simply cannot normalize file names, > because both Windows and Unix filesystems distinguish between names that > are equivalent under canonical equivalence. The preferable thing to do is to have form-preserve-on-create and form- insensitive lookups in the filesystem, which creates some unlikely aliases, but mostly avoids real problems. In practice few filesystems do this, so applications layered above them have to do something. "Nothing" works most of the time. Sometimes it doesn't work, and when it doesn't, it hurts. The example that comes to mind is git on HFS+. git has a configuration option to normalize: core.precomposeunicode:: This option is only used by Mac OS implementation of Git. When core.precomposeunicode=true, Git reverts the unicode decomposition of filenames done by Mac OS. This is useful when sharing a repository between Mac OS and Linux or Windows. (Git for Windows 1.7.10 or higher is needed, or Git under cygwin 1.7). When false, file names are handled fully transparent by Git, which is backward compatible with older versions of Git. Also, the ideal is that the filesystem stores Unicode filenames, and consumers in any non-Unicode locales convert. I think I have an I-D lying about discussing this. This keeps coming up. Maybe we should publish it? Though the file: URI scheme makes a poor trigger for publishing it: it'll be easier to ignore normalization in the file: scheme. Nico --
Received on Wednesday, 10 December 2014 02:23:36 UTC