Re: FYI: Mozilla's Resource Packages from Martin J. Dürst on 2009-11-18 (ietf-http-wg@w3.org from October to December 2009)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Wed, 18 Nov 2009 18:35:58 +0900
To: Alexander Limi <limi@mozilla.com>
CC: Julian Reschke <julian.reschke@gmx.de>, Anthony Bryan <anthonybryan@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <4B03BFFE.5040207@it.aoyama.ac.jp>

On 2009/11/18 7:20, Alexander Limi wrote:
> 2009/11/17 Julian Reschke<julian.reschke@gmx.de>
>
>> Questions that come to mind:

>> (4) I have trouble understanding...:
>>
>> "You can specify a charset in the resource package definition. If
>> unspecified, it is assumed that any non-binary files inside are UTF-8."
>>
>> Is this about the manifest? This seems to be problematic, as charset
>> handling would be different from local file resources

But different clients will have different ways to handle local file 
resources, so the server shouldn't try to second-guess and adapt to 
local conventions. Using something that works easily over the network, 
such as UTF-8, seems to be the best choice.

>> (I do agree that
>> encouraging UTF-8 is good, though)
>>
>
> The manifest probably has to be ASCII (using quoted values like %20 for
> spaces etc), sorry about not specifying that.

I'm not clear why you would need ASCII. The manifest file is something 
new, and thus saying it's UTF-8 would not have any problems, or would 
it? In any case, you should look at how non-ASCII filenames inside a zip 
file can be handled. The W3C Web Applications WG and the W3C I18N Core 
WG had a look at this issue in the context of 
http://www.w3.org/TR/widgets/. You should be able to find that 
discussion if you search the mailing list archives.

> The UTF-8 default is for any other file in the zip, like JS or CSS, or even
> HTML files, should that be useful.

So this is a replacement for a charset=foo parameter on the media type?
How do the mime types of the various files get determined in the first 
place? It's easy for humans to guess media types in an obvious example 
such as
    javascript/jquery.js
    css/reset.css
    css/grid.css
    css/main.css
    images/save.png
    images/info.png
but we are speaking about computers, and on the average much less 
obvious directory/file names.

> (5) How do non-URL characters in filenames in the ZIP map to URLs in
>> content? It appears that a default encoding needs to be defined (such as
>> ->UTF-8->percent-escaped).
>>
>
> Percent-escaping would be my initial suggestion, but I don't know enough
> about any potential issues here if we choose to go that route. I agree that
> it needs to be defined, though.

You should do whatever makes things work well with the IRI spec (RFC 
3987, now being updated as draft-duerst-iri-bis). This means that 
non-ASCII characters in Web addresses in content should be transcoded to 
UTF-8, and then either matched directly (if the manifest file is UTF-8), 
or via %-encoding. I think that's what Julian is proposing, I'm just 
trying to give a slightly longer explanation.

Regards,   Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

Received on Wednesday, 18 November 2009 09:37:00 UTC