W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > November 2011

[Bug 14526] WF2: When adding filenames to the data set, should there be normalization of decomposed forms?

From: <bugzilla@jessica.w3.org>
Date: Fri, 04 Nov 2011 06:25:31 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1RMDDX-0002EU-El@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=14526

--- Comment #18 from NARUSE, Yui <naruse@airemix.jp> 2011-11-04 06:25:31 UTC ---
(In reply to comment #16)
> So how exactly should it be defined? "File names must be exposed in a
> normalized form, whether in the DOM (e.g. in File objects) or in form
> submission, regardless of the conventions of the user agent's platform's file
> system. The normalization form used must be Unicode normalization Form C (NFC),
> except that input characters in the range U+2000 to U+2FFF, U+F900 to U+FA6A,
> and U+2F800 to U+2FA1D must be left unchanged in the output."?

I think so.
But whether such behavior should be portable (should be applied other than Mac
OS X) is debatable.

Imagine following situation, a directory has two file, U+795E.txt and
U+FA19.txt.
And the user want to upload them. As you can notice, DOM and uploaded server
can't distinguish them. Normalization considered harmful.

It is not harmful only where the file's filesystem uses normalization,
and the filesystem and the browser uses exactly the same algorithm.

Idealy normalization over filenames should be done only for files on the
normalized
filesystems such as HFS Plus. (but an assumption filenames on Mac OS X are
normalized can be acceptable)

> This isn't what any browser does as far as I can tell. Are we sure that what
> WebKit does is broken for CJK?

Yes, current WebKit normalizes those Kanjis, and it is considered breakage.
You can see the breakage by uploading U+FA19.txt.
After uploading, it become U+795E.txt and you can find the left part of the
Kanji is changed.
These kanjis have the same meaning "god", and specified as compatibility
character thorough
some political reason, but people don't want to normalize them other than the
true
normalization situation.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Friday, 4 November 2011 06:38:27 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 4 November 2011 06:38:27 GMT