W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > October 2011

[Bug 14526] WF2: When adding filenames to the data set, should there be normalization of decomposed forms?

From: <bugzilla@jessica.w3.org>
Date: Fri, 28 Oct 2011 19:39:42 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1RJsHG-0006C8-09@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=14526

Ian 'Hixie' Hickson <ian@hixie.ch> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW

--- Comment #11 from Ian 'Hixie' Hickson <ian@hixie.ch> 2011-10-28 19:39:40 UTC ---
I created a test that would distinguish normalisation forms:
http://damowmow.com/playground/demos/filename-upload/002.html

When I create the given file name on Mac, I get a file whose name's bytes are
displayed to the console by ls(1) piped through hexdump as:

   c5 bf cc a3 cc 87 e2 84 ab

This isn't what I expected. In particular, it means that it is not normalising
singletons, but is doing NFD for composition. As far as I can tell.

Uploading this file results in the following (recent builds or latest shipping
copy in all cases, only testing POST):

Mac Firefox: same as file system (c5 bf cc a3 cc 87 e2 84 ab)
Mac Opera: same as file system (c5 bf cc a3 cc 87 e2 84 ab)
Mac Safari: NFC (e1 ba 9b cc a3 c3 85)
Mac Chrome: NFC (e1 ba 9b cc a3 c3 85)

On Windows I had more trouble creating the file. I copied and pasted the string
from the page in IE to a command shell to create the file. According to dir,
the file had three characters, which it displayed as "??Å.txt". No idea what
kind of "Å" that is, unfortunately. Then I tried uploading it (sorry about the
old software versions):

IE9: original bytes (e1 ba 9b cc a3 e2 84 ab)
Win Firefox 5: original bytes (e1 ba 9b cc a3 e2 84 ab)
Win Safari 5: NFC (e1 ba 9b cc a3 c3 85)
Win Chrome: NFC (e1 ba 9b cc a3 c3 85)

So basically as far as I can tell, all browsers except WebKit-based browsers do
no normalisation, they just trust the file system. On Mac this is slightly
problematic only because Mac's file system does its own normalisation. WebKit
always does NFC normalisation on the file name before submission.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Friday, 28 October 2011 19:39:46 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:02:06 UTC