Re: Request for Comments: Last Call WD of Widgets 1.0: Packaging & Configuration spec; deadline 31 Jan 2009

Marcos Caceres wrote:
> Ok, that sounds like a completely reasonable proposal. And you are right, I
> had thought about this in totally the wrong way. I did as you suggested:
>   * widget engines may now support SVG 1.1.
>   * authors, however, should try to conform to SVG Tiny 1.2.
>   * conformance checkers should warn authors when their icons don't conform
> to SVG tiny 1.2.  

Note that SVG Tiny 1.2 is not a subset of SVG 1.1, by the way...  I'm 
not sure whether that should affect this section; just pointing it out.

I think it makes more sense to just allow widget engines to implement 
whatever SVG version they want (as in, place no restrictions on it, past 
the fact that .svg files should be processed per the image/svg+xml MIME 
type registration).

> Correct. So what is wrong with limiting sniffing to the table in the spec?

Nothing.  In fact it's highly desirable.

> Or to the content-sniffing internet draft I pointed you to earlier?... I'm
> not sure I'm understanding what you want me to specify here.

I was just pointing out that current implementations of something like 
widgets which don't use a MIME manifest or some such use an alternate 
system (aggressive extension sniffing) that we don't want to use here.

> Understood. However, wouldn't you have to deal with the fact that
> non-conforming zip implementations are used to create the widgets in the
> first place. 

That's a good question, actually.  I'm not sure I have enough of a grasp 
of the issue to tell you what this would mean for a widget UA in 
practice....

>> Do we have any data to support this supposition?  That's certainly how
>> things work with web pages, and in small market segments like Western
>> Europe there are multiple encodings in common use (ISO-8859-1 and
>> UTF-8).  
> 
> No, not directly. I only have anecdotal evidence: a podcast from the Harvard
> Business Review about globalization and the internet, but I don't have a
> pointer. In that podcast, some research was presented that indicated that
> only 15% of internet traffic actually leaves the boundaries of a country and
> is decreasing. That means that 85% or more of all communication would, in
> theory, be done using the same language and, by extension, the same
> character encoding.

Unfortunately, the language to character encoding mapping is not 
one-to-one...  See above about Western Europe.

> I reached similar conclusions through my own testing/research [1]. Note that
> on Mac it is apparently some proprietary variant of UTF-8 in fully
> decomposed canonical form. I'm not sure what different flavors of Linux use

Nowadays UTF-8 for the most part, at least for new data being created.

> but again: things seem bad on the file name encoding front. In essence, you
> can't share Zip files across OS if they contain characters outside the ASCII
> range.

This seems like a problem to me...

> By "reality" I meant the reality about zip implementations - i.e., no
> respect for encodings.

OK.

> MHTML *may* be more technically superior and architecturally better, but
> there is more tool support for Zip than MTHML. AFAIK, MHTML packaging tools
> do not ship with any operating system. Zipping tools do.

Quite true.  At the same time, we're discussing the fact that once you 
want non-ASCII filenames the zip tools hinder more than help, right?

> I don't have any statistics, but I assume Zip is used around the world - I
> mean the fact that it is a standard tool on all OS has to mean something
> significant.

True.

> Also, Mozilla uses it to ship add-ons right? What, if any,
> problems have you guys experienced wrt to zip in internationalized contexts?

Sort of.  We use JAR, not ZIP.  Any JAR file is a ZIP file, but not vice 
versa.  In particular, the JAR spec [1] defines that all non-ASCII bytes 
are UTF-8.

> Again, I'm not sure how to proceed.

That really depends on how much you care about allowing any ZIP 
implementation to be used for creating widgets vs how much you care 
about internationalization issues that might arise as a result...

> "In result, excluding any U+0020 SPACE characters, convert any sequence of
> one or more characters marked with the [Unicode] property "White_Space" into
> a single U+0020 SPACE."
> 
> The next step collapses sequences of two or more U+0020 SPACE into a single
> U+0020 SPACE.

Sounds great.

-Boris

Received on Wednesday, 28 January 2009 15:10:05 UTC