[w3c/FileAPI] Specify how filenames from the OS map to File's `name` property (#161)

I've been testing how files coming from the OS get exposed as `File` objects, and in particular how filenames that aren't in the OS's default encoding get mapped to the `name` property.

In Windows systems, filenames are sequences of UTF-16 code units (not UTF-16 encoded text, as is sometimes claimed, because the system APIs don't check for lone surrogates), and as expected, they directly map to a `DOMString`. An initial BOM doesn't get removed. There doesn't seem to be any browser differences here.

In Unix systems (tested on Fedora Linux; my understanding is all other modern Unix variants/distros work the same), filenames are byte sequences, which are usually taken to be UTF-8. Here's how the various browsers behave on them:

- Firefox does the equivalent of [UTF-8 decode without BOM](https://encoding.spec.whatwg.org/#utf-8-decode-without-bom), decoding bytes which aren't valid UTF-8 as a replacement character.
- WebKit does the equivalent of [UTF-8 decode without BOM or fail](https://encoding.spec.whatwg.org/#utf-8-decode-without-bom-or-fail), and handles failures by returning a `File` object with the empty string as filename, empty contents, and MIME type `application/octet-stream` instead. The language around `name` in the spec might allow for an empty string to substitute a filename that cannot be decoded, but it doesn't allow the content to be dropped. Note that the resulting `File` object is identical to the `File` object that HTML's ["construct the entry list"](https://html.spec.whatwg.org/#constructing-form-data-set) creates when a file input has no selected files.
- Chrome also does the equivalent of [UTF-8 decode without BOM or fail](https://encoding.spec.whatwg.org/#utf-8-decode-without-bom-or-fail), except that for file inputs, any file whose filename isn't UTF-8 gets dropped from the selection. For drag and drop, Chrome behaves the same as WebKit.

Since it doesn't seem good to drop files or replace them by an empty file, even when their filenames don't match the OS's conventions, it seems like it would be best to agree on Firefox's behavior.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3c/FileAPI/issues/161

Received on Thursday, 4 March 2021 19:23:49 UTC