Re: New filesystem/directory API proposal from Eric Uhrhane on 2010-02-03 (public-device-apis@w3.org from February 2010)

From: Eric Uhrhane <ericu@google.com>
Date: Wed, 3 Feb 2010 14:25:16 -0800
To: Ian Hickson <ian@hixie.ch>
Cc: Michael Nordman <michaeln@google.com>, public-device-apis@w3.org
Message-ID: <44b058fe1002031425g35153385v1b63b94dc5814dbc@mail.gmail.com>
On Wed, Feb 3, 2010 at 3:52 AM, Ian Hickson <ian@hixie.ch> wrote:
> On Mon, 1 Feb 2010, Eric Uhrhane wrote:
>> On Mon, Feb 1, 2010 at 2:53 PM, Ian Hickson <ian@hixie.ch> wrote:
>> > On Mon, 1 Feb 2010, Michael Nordman wrote:
>> >>
>> >> What happens when some external program generates an invalidly name
>> >> file in the directory being employed by the WebFS (assuming an impl
>> >> is doing the obvious thing and directly mapping a native directory to
>> >> the root of the WebFS)?
>> >
>> > There's an easy way around this one -- define the mapping such that
>> > there aren't any invalid names.
>>
>> Are you thinking of something like the MS "foo~1" convention?
>
> That's not a mapping, that's just hiding the real name and using a
> placeholder in the filesystem.
>
> No, I mean a mapping. For example, suppose the restriction was that the
> file system couldn't have "/" characters in the name, and the API couldn't
> have ":" characters in the name. You could define a trivial mapping where
> / in the API and : in the filesystem were equivalent.
>
> Or suppose you had a filesystem that couldn't have A-E in the name, and an
> API that couldn't have 0-3 in the name. You could define a mapping by
> artificially restricting the API further so that 4 was also not allowed,
> and then simply mapping A-E to 0-4.
>
> Or suppose you had a filesystem that couldn't have any of 0-3, and you had
> an API that allowed anything at all except A and B. You could define a
> mapping wherein any sequence of the characters 0-3 was converted to a
> sequence of A and B characters wherein each run of A characters represents
> a character 1-3 based on whether there's 1, 2, or 3 As, and each B
> separates runs of characters; two Bs in a row, as well as a B at the start
> or end, implies a 0. Three AAAs implicitly have a B after them. The
> opposite mapping goes the other way.
>
> So:
>
>   As seen in the filesystem...       As seen in the API...
>   testAtest                          test1test
>   testBtest                          test0test
>   testAAAABBBAtest                   test31001test
>   testAAABtest                       test30test
>   __ABAB_BBAA__                      __110_002__
>
> (Obviously the characters wouldn't actually be 0-3 or A and B.)
>
> Generally speaking you can convert any set of strings with one alphabet
> into a set of strings in another alphabet in a 1:1 manner. If both
> alphabets have a large common set of characters, and both have at least 2
> characters that the other alphabet does not, you can always reduce the
> problem to the 0/1/2/3/A/B example above, for instance. The trick is
> getting the mapping to be something that preserves most of the filename;
> it's of course pretty trivial to come up with a mapping that is 1:1
> between two specific alphabets but where the two filenames have no
> apparent relationship.

I see your point, but there are some complexities to be dealt with.
One is that this is harder where the restricted character list in one
alphabet A is a strict subset of those in the language B.  At that
point you don't have trivial escapes/replacements, since anything you
come up with as a translation in A of a problematic filename in
language B is also a legal file name in B.  So then you need to
translate even the legal filenames, which likely either makes your
translations really ugly or requires new restrictions on what A can
represent, which was what we were trying to avoid in the first place.
Or both, of course.

That's just to solve the problem of filenames that are illegal in
isolation.  When you have filenames that are individually legal, but
illegal together [two files in the same directory that differ only in
case on a standard OSX install, for instance], simple translations
aren't enough.  That's when you run into the problems with files
changing representation when they change directory [as in old Windows
apps trying to deal with long file names].

Then there are the filenames that don't use illegal content, but are
just too long.  Ext[234] support up to 255 /bytes/ in a path segment,
but FAT32 is fine with 255 /UTF-16 code units/ with LFN.  How do we
emulate long names on a short filesystem?  It can be done, of course,
but then you get further and further away from being able to share
files nicely with client apps outside the browser.

>> It's quite doable, obviously, but it can have unintuitive effects when
>> you move multiple "difficult" files into and out of a directory, when
>> names clash with files /actually/ called "foo~1", when you want to copy
>> a file to another directory without changing its name, when you want to
>> move "illegal:filename.txt" to "illegal:filename.txt.bak", and when
>> you're managing the files both in the browser and via the external
>> program that's created the awkward name.  And I'm sure I'm missing some
>> cases.
>
> I don't think any of those are really problems in the example case I gave
> except the "managing the files both in the browser and via the external
> program that's created the awkward name" case.

I think that that's an important use case.  We want to make sure that
the browser isn't a silo whose data is awkward to access from outside.
 If we can come up with a mapping that 1) works for all possible
filenames; 2) only shows up for the problem files; 3) doesn't
obfuscate external filenames too much, then I think we've got a
winner.  However, in the absence of such a solution, I think we should
stick with the least-common-denominator approach I've outlined.

If you've got a mapping in mind, I'm all ears.

     Eric
Received on Wednesday, 3 February 2010 22:26:05 UTC