Re: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

On Tue, Dec 09, 2014 at 09:07:59PM -0500, John Cowan wrote:
> Phillips, Addison scripsit:
> > These are both in UTF-8, are visually indistinguishable, and are
> > identical under NFC, but fopen() cares which bag of bytes you grab.
> 
> The same is true on Windows, where filenames are 16-bit code units rather
> than 8-bit code units.  In general, we simply cannot normalize file names,
> because both Windows and Unix filesystems distinguish between names that
> are equivalent under canonical equivalence.

The preferable thing to do is to have form-preserve-on-create and form-
insensitive lookups in the filesystem, which creates some unlikely
aliases, but mostly avoids real problems.

In practice few filesystems do this, so applications layered above them
have to do something.  "Nothing" works most of the time.  Sometimes it
doesn't work, and when it doesn't, it hurts.  The example that comes to
mind is git on HFS+.  git has a configuration option to normalize:

core.precomposeunicode::
        This option is only used by Mac OS implementation of Git.
        When core.precomposeunicode=true, Git reverts the unicode decomposition
        of filenames done by Mac OS. This is useful when sharing a repository
        between Mac OS and Linux or Windows.
        (Git for Windows 1.7.10 or higher is needed, or Git under cygwin 1.7).
        When false, file names are handled fully transparent by Git,
        which is backward compatible with older versions of Git.

Also, the ideal is that the filesystem stores Unicode filenames, and
consumers in any non-Unicode locales convert.

I think I have an I-D lying about discussing this.  This keeps coming
up.  Maybe we should publish it?  Though the file: URI scheme makes a
poor trigger for publishing it: it'll be easier to ignore normalization
in the file: scheme.

Nico
-- 

Received on Wednesday, 10 December 2014 02:23:36 UTC