- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Wed, 18 Nov 2009 18:35:58 +0900
- To: Alexander Limi <limi@mozilla.com>
- CC: Julian Reschke <julian.reschke@gmx.de>, Anthony Bryan <anthonybryan@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
On 2009/11/18 7:20, Alexander Limi wrote: > 2009/11/17 Julian Reschke<julian.reschke@gmx.de> > >> Questions that come to mind: >> (4) I have trouble understanding...: >> >> "You can specify a charset in the resource package definition. If >> unspecified, it is assumed that any non-binary files inside are UTF-8." >> >> Is this about the manifest? This seems to be problematic, as charset >> handling would be different from local file resources But different clients will have different ways to handle local file resources, so the server shouldn't try to second-guess and adapt to local conventions. Using something that works easily over the network, such as UTF-8, seems to be the best choice. >> (I do agree that >> encouraging UTF-8 is good, though) >> > > The manifest probably has to be ASCII (using quoted values like %20 for > spaces etc), sorry about not specifying that. I'm not clear why you would need ASCII. The manifest file is something new, and thus saying it's UTF-8 would not have any problems, or would it? In any case, you should look at how non-ASCII filenames inside a zip file can be handled. The W3C Web Applications WG and the W3C I18N Core WG had a look at this issue in the context of http://www.w3.org/TR/widgets/. You should be able to find that discussion if you search the mailing list archives. > The UTF-8 default is for any other file in the zip, like JS or CSS, or even > HTML files, should that be useful. So this is a replacement for a charset=foo parameter on the media type? How do the mime types of the various files get determined in the first place? It's easy for humans to guess media types in an obvious example such as javascript/jquery.js css/reset.css css/grid.css css/main.css images/save.png images/info.png but we are speaking about computers, and on the average much less obvious directory/file names. > (5) How do non-URL characters in filenames in the ZIP map to URLs in >> content? It appears that a default encoding needs to be defined (such as >> ->UTF-8->percent-escaped). >> > > Percent-escaping would be my initial suggestion, but I don't know enough > about any potential issues here if we choose to go that route. I agree that > it needs to be defined, though. You should do whatever makes things work well with the IRI spec (RFC 3987, now being updated as draft-duerst-iri-bis). This means that non-ASCII characters in Web addresses in content should be transcoded to UTF-8, and then either matched directly (if the manifest file is UTF-8), or via %-encoding. I think that's what Julian is proposing, I'm just trying to give a slightly longer explanation. Regards, Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Wednesday, 18 November 2009 09:37:00 UTC