- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Wed, 28 Jan 2009 14:22:58 -0500
- To: Marcos Caceres <marcosscaceres@gmail.com>
- CC: public-webapps <public-webapps@w3.org>
Marcos Caceres wrote: > Ok, as I know little of SVG, I've asked Doug Scheppers to help me That sounds like an excellent plan. Thank you! > It is, but this affects more than just Zip. See also [3] with the > problems Limewire had in respect to normalization of Unicode on MacOs > X. Note that this is a pretty old article. I agree that this stuff doesn't work as well as would be ideal, of course. >> Sort of. We use JAR, not ZIP. Any JAR file is a ZIP file, but not vice >> versa. In particular, the JAR spec [1] defines that all non-ASCII bytes are >> UTF-8. > > AFAIK, JAR uses Java's Modified UTF-8 so it's quite proprietary. The only difference between standard UTF-8 and Modified UTF-8 is how the character U+0000 is encoded. If someone is putting that particular character in their filenames, I have no problem saying that behavior is undefined as long as it's secure. > The use of modified UTF-8 in Java wrt Zip has led to significant problems > [2] (this bug appeared in 1999 (!) Looks like that bug is more about the fact that using Java's ZIP-manipulation functionality on JARs fails because the ZIP-manipulation stuff uses the OS-default encoding... Which does bring us back to the issue of ZIP tools sucking in this regard, of course. > My gut feeling is that we run with this known issue; We have a warning > in the spec that authors should avoid using file names outside the > ASCII range. I can live with that, as long as the issue has been considered. In practice, I'll just hope that everyone involved migrates to UTF-8 and is done with it. -Boris
Received on Wednesday, 28 January 2009 19:23:49 UTC