RE: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt from Phillips, Addison on 2014-12-09 (uri@w3.org from December 2014)

From: Phillips, Addison <addison@lab126.com>
Date: Tue, 9 Dec 2014 23:05:28 +0000
To: Matthew Kerwin <matthew@kerwin.net.au>
CC: Nico Williams <nico@cryptonector.com>, IETF Apps Discuss <apps-discuss@ietf.org>, "uri@w3.org" <uri@w3.org>
Message-ID: <7C0AF84C6D560544A17DDDEB68A9DFB52EAAC3CC@ex10-mbx-9007.ant.amazon.com>

Yeah, that’s exactly the example filesystem I had in mind.

Actually, my thought was that U+00E4 and U+0061.0308 would be:

{ 0xC3.A4 } vs. { 0x61.CC.88 }

These are both in UTF-8, are visually indistinguishable, and are identical under NFC, but fopen() cares which bag of bytes you grab.

Addison

From: phluid61@gmail.com [mailto:phluid61@gmail.com] On Behalf Of Matthew Kerwin
Sent: Tuesday, December 09, 2014 3:00 PM
To: Phillips, Addison
Cc: Nico Williams; IETF Apps Discuss; uri@w3.org
Subject: Re: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

On 10 December 2014 at 08:37, Phillips, Addison <addison@lab126.com<mailto:addison@lab126.com>> wrote:
Although normalization is often a good idea... normalization might be a problem if the local filesystem allows normalized and non-normalized representations both to appear. You wouldn't be able to specify a non-normalized representation.

Do you have an example? I'm trying to think it through, but I keep going in circles. The one I think of is ext[2-4] where the filesystem stores octet sequences, and shell/applications/etc. use things like the user's locale environment when representing those octets as text strings. Are you saying that if we mandate NFC normalisation of URIs, you can't distinguish between a files whose filename octets are {0xE4} vs {0xC3, 0xA4} (i.e. U+00E4 "ä" in WIndows-1252 / UTF-8)?

Wouldn't "file://%E4<file:///\\%E4>" would cover that?


--
  Matthew Kerwin
  http://matthew.kerwin.net.au/

Received on Tuesday, 9 December 2014 23:06:25 UTC