Re: Request for Comments: Last Call WD of Widgets 1.0: Packaging & Configuration spec; deadline 31 Jan 2009

Marcos Caceres wrote:
> Ok, as I know little of SVG, I've asked Doug Scheppers to help me

That sounds like an excellent plan.  Thank you!

> It is, but this affects more than just Zip. See also [3] with the
> problems Limewire had in respect to normalization of Unicode on MacOs
> X.

Note that this is a pretty old article.  I agree that this stuff doesn't 
work as well as would be ideal, of course.

>> Sort of.  We use JAR, not ZIP.  Any JAR file is a ZIP file, but not vice
>> versa.  In particular, the JAR spec [1] defines that all non-ASCII bytes are
>> UTF-8.
> 
> AFAIK, JAR uses Java's Modified UTF-8 so it's quite proprietary.

The only difference between standard UTF-8 and Modified UTF-8 is how the 
character U+0000 is encoded.  If someone is putting that particular 
character in their filenames, I have no problem saying that behavior is 
undefined as long as it's secure.

> The use of modified UTF-8 in Java wrt Zip has led to significant problems
> [2] (this bug appeared in 1999 (!)

Looks like that bug is more about the fact that using Java's 
ZIP-manipulation functionality on JARs fails because the 
ZIP-manipulation stuff uses the OS-default encoding...

Which does bring us back to the issue of ZIP tools sucking in this 
regard, of course.

> My gut feeling is that we run with this known issue; We have a warning
> in the spec that authors should avoid using file names outside the
> ASCII range.

I can live with that, as long as the issue has been considered.  In 
practice, I'll just hope that everyone involved migrates to UTF-8 and is 
done with it.

-Boris

Received on Wednesday, 28 January 2009 19:23:49 UTC