Re: Request for Comments: Last Call WD of Widgets 1.0: Packaging & Configuration spec; deadline 31 Jan 2009 from Marcos Caceres on 2009-01-28 (public-webapps@w3.org from January to March 2009)

From: Marcos Caceres <marcosscaceres@gmail.com>
Date: Wed, 28 Jan 2009 17:57:39 +0000
To: Boris Zbarsky <bzbarsky@mit.edu>
Cc: public-webapps <public-webapps@w3.org>
Message-ID: <b21a10670901280957u4252e3e6o101a86dccdb186bb@mail.gmail.com>
On Wed, Jan 28, 2009 at 3:09 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote:
> Marcos Caceres wrote:
>>
>> Ok, that sounds like a completely reasonable proposal. And you are right,
>> I
>> had thought about this in totally the wrong way. I did as you suggested:
>>  * widget engines may now support SVG 1.1.
>>  * authors, however, should try to conform to SVG Tiny 1.2.
>>  * conformance checkers should warn authors when their icons don't conform
>> to SVG tiny 1.2.
>
> Note that SVG Tiny 1.2 is not a subset of SVG 1.1, by the way...  I'm not
> sure whether that should affect this section; just pointing it out.

Bah :(

> I think it makes more sense to just allow widget engines to implement
> whatever SVG version they want (as in, place no restrictions on it, past the
> fact that .svg files should be processed per the image/svg+xml MIME type
> registration).

Ok, as I know little of SVG, I've asked Doug Scheppers to help me
specify this properly. Once we have a concrete solution, I will run it
by you again for your approval.

>> Correct. So what is wrong with limiting sniffing to the table in the spec?
>
> Nothing.  In fact it's highly desirable.

Ok. Cool.

>> Or to the content-sniffing internet draft I pointed you to earlier?... I'm
>> not sure I'm understanding what you want me to specify here.
>
> I was just pointing out that current implementations of something like
> widgets which don't use a MIME manifest or some such use an alternate system
> (aggressive extension sniffing) that we don't want to use here.

Ok. The working group is aware of the potential security implications
and will try to work with the editors of [2] to make sure our use
cases are covered.

>> Understood. However, wouldn't you have to deal with the fact that
>> non-conforming zip implementations are used to create the widgets in the
>> first place.
>
> That's a good question, actually.  I'm not sure I have enough of a grasp of
> the issue to tell you what this would mean for a widget UA in practice....
>
>>> Do we have any data to support this supposition?  That's certainly how
>>> things work with web pages, and in small market segments like Western
>>> Europe there are multiple encodings in common use (ISO-8859-1 and
>>> UTF-8).
>>
>> No, not directly. I only have anecdotal evidence: a podcast from the
>> Harvard
>> Business Review about globalization and the internet, but I don't have a
>> pointer. In that podcast, some research was presented that indicated that
>> only 15% of internet traffic actually leaves the boundaries of a country
>> and
>> is decreasing. That means that 85% or more of all communication would, in
>> theory, be done using the same language and, by extension, the same
>> character encoding.
>
> Unfortunately, the language to character encoding mapping is not
> one-to-one...  See above about Western Europe.

Ok, point taken.

>> I reached similar conclusions through my own testing/research [1]. Note
>> that
>> on Mac it is apparently some proprietary variant of UTF-8 in fully
>> decomposed canonical form. I'm not sure what different flavors of Linux
>> use
>
> Nowadays UTF-8 for the most part, at least for new data being created.
>
>> but again: things seem bad on the file name encoding front. In essence,
>> you
>> can't share Zip files across OS if they contain characters outside the
>> ASCII
>> range.
>
> This seems like a problem to me...

It is, but this affects more than just Zip. See also [3] with the
problems Limewire had in respect to normalization of Unicode on MacOs
X. This probably gets worst from OS to OS.... I'm impressed computers
work at all! :)

>> By "reality" I meant the reality about zip implementations - i.e., no
>> respect for encodings.
>
> OK.
>
>> MHTML *may* be more technically superior and architecturally better, but
>> there is more tool support for Zip than MTHML. AFAIK, MHTML packaging
>> tools
>> do not ship with any operating system. Zipping tools do.
>
> Quite true.  At the same time, we're discussing the fact that once you want
> non-ASCII filenames the zip tools hinder more than help, right?

Correct.

>> Also, Mozilla uses it to ship add-ons right? What, if any,
>> problems have you guys experienced wrt to zip in internationalized
>> contexts?
>
> Sort of.  We use JAR, not ZIP.  Any JAR file is a ZIP file, but not vice
> versa.  In particular, the JAR spec [1] defines that all non-ASCII bytes are
> UTF-8.

AFAIK, JAR uses Java's Modified UTF-8 so it's quite proprietary. The
use of modified UTF-8 in Java wrt Zip has led to significant problems
[2] (this bug appeared in 1999 (!), and it is now the second most
voted on bug in Sun's bug database), and has even prompted the
creation of custom Zipping libraries, like TrueZip, to overcome Java's
implementation issues of Zip and it's issues with encoding problems.

>> Again, I'm not sure how to proceed.
>
> That really depends on how much you care about allowing any ZIP
> implementation to be used for creating widgets vs how much you care about
> internationalization issues that might arise as a result...

My gut feeling is that we run with this known issue; We have a warning
in the spec that authors should avoid using file names outside the
ASCII range.

Like I said previously, none of the companies in this working group
that have widget engines already in the wild have reported any
pressing need for us to resolve the file name encoding issue. In fact,
some initially did not want any i18n stuff in the spec at all. I
pushed to resolve this issue because I strongly believe in the W3C's
mission of making stuff "available to all people" [4]. However, I've
come to accept that we might just have to live with the limitations of
Zip implementations. Even if file name encoding is broken, at least we
provide a rich set of i18n tools within the spec to have content
localized properly.

Kind regards,
Marcos

[1] http://webblaze.cs.berkeley.edu/2009/mime-sniff/mime-sniff.txt
[2] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4244499
[3] http://www.w3.org/Consortium/mission
[4] http://osdir.com/ml/network.gnutella.limewire.core.devel/2003-01/msg00000.html

-- 
Marcos Caceres
http://datadriven.com.au
Received on Wednesday, 28 January 2009 17:58:21 UTC