Yeah, that’s exactly the example filesystem I had in mind.
Actually, my thought was that U+00E4 and U+0061.0308 would be:
{ 0xC3.A4 } vs. { 0x61.CC.88 }
These are both in UTF-8, are visually indistinguishable, and are identical under NFC, but fopen() cares which bag of bytes you grab.
Addison
From: phluid61@gmail.com [mailto:phluid61@gmail.com] On Behalf Of Matthew Kerwin
Sent: Tuesday, December 09, 2014 3:00 PM
To: Phillips, Addison
Cc: Nico Williams; IETF Apps Discuss; uri@w3.org
Subject: Re: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt
On 10 December 2014 at 08:37, Phillips, Addison <addison@lab126.com<mailto:addison@lab126.com>> wrote:
Although normalization is often a good idea... normalization might be a problem if the local filesystem allows normalized and non-normalized representations both to appear. You wouldn't be able to specify a non-normalized representation.
Do you have an example? I'm trying to think it through, but I keep going in circles. The one I think of is ext[2-4] where the filesystem stores octet sequences, and shell/applications/etc. use things like the user's locale environment when representing those octets as text strings. Are you saying that if we mandate NFC normalisation of URIs, you can't distinguish between a files whose filename octets are {0xE4} vs {0xC3, 0xA4} (i.e. U+00E4 "ä" in WIndows-1252 / UTF-8)?
Wouldn't "file://%E4<file:///\\%E4>" would cover that?
--
Matthew Kerwin
http://matthew.kerwin.net.au/