Re: [WIDGETS] Zip Support (request for comments)

> The ideal approach from a standards perspective would be to separate out
> the ZIP writeup into a separate standalone spec (i.e., don't reference the
> OCF spec, just "repurpose" its technical approaches) so that it can be
> reused by other initiatives (W3C or otherwise). When I was involved in the
> OCF spec, the IDPF folks were amenable to updating their eBook specs to
> point to an official standard packaging standard from other standards
> bodies, where W3C and OASIS were the presumed likely choices, and W3C was
> the top preference. Maybe the next version of ODF would reference such a W3C
> standard. I am pretty sure they would conclude it's the right thing to do.
>
What you are proposing is a good idea, I'll remove the references to OCF and
continue to repurpose the technical details. However, creating an
independent Zip-based spec might be beyond the scope of WAF (although it
would be nice if one day PKWARE contributed their spec to the W3C)... in any
case, you will have to ask our working group chair if defining a distinct
packaging spec can be part of the WAF charter (will someone at IBM be
willing to edit it?;-)). Also, the W3C tried to standardize (XML) packaging
in the past [1]. But, from what I gather, the working group was disbanded
because of lack of industry interest/support.

I haven't had time to think through the UTF-8 issues. A minor red flag is
> raised when I see the word "MAY". Are you saying it is OK to use
> platform-native encodings, Shift-JIS encoding or (showing my age) EBCDIC
> encodings? Maybe there is an encoding field in the ZIP spec. (If I ever knew
> about this field, I have forgotten it by now.) Remember, the goal of
> standards are to promote interoperability, and if file name encodings are a
> free-for-all, then interoperability might suffer.
>
My understanding of [2] (Appendix D) is that Zip either allows the IBM Code
Page 437 encoding by default (general purpose bit 11 is off) or UTF-8
(general purpose bit 11 is on) . However, it then says:

"Applications may choose to supplement this file name storage through the
use of the 0x0008 Extra Field....Examples of the intended usage for this
field is to store whether "modified-UTF-8" (JAVA) is used, or UTF-8-MAC.
Similarly, other commonly used character encoding (code page) designations
can be indicated through this field. Formalized values for use of the 0x0008
record remain undefined at this time. The definition for the layout of the
0x0008 field will be published when available."

Regarding the issue of proprietary extensions, how about just staying silent
> on the issue? Basically, the above OCF-like approach is a whitelisting
> approach which identifies the fields that producers and consumers must
> support. Other fields, whether define in the ZIP spec or extensions defined
> by vendors, can be ignored by the consumer. For example, there isn't a
> problem with MS (for example) extending ZIP to make the format do special
> magic on Windows so long as the resulting ZIP file will still open with
> non-MS software (e.g., WinZip) and continue to work on Mac and Linux
> systems.
>
Agreed... but something in the way OCF specifies the ZIP subset is not
sitting right (eg, it doesn't say which bits need to be turned on and off);
that's why I went looking for alternatives like the one proposed by OOXML
(OPC). Unless MS/ECMA are pulling a fast one on me (which may be likely from
what I've been reading, eg.[3,4]), I'm not sure that OOXML packaging does
any special Windows magic. (If anyone has any evidence to the contrary,
please let me know). And, although there is plenty of justified criticism
against the XML formats defined by OOXML, I haven't yet encountered evidence
to suggests that OPC's usage/definition of Zip is broken.

OOXML's competitor, ODF (ISO/IEC 26300), also defines a zip-based packaging
format. However, having read section 17 of ODF, which defines packaging, I
found it to be underspecified (I imagine that the IDPF folks did too, given
that OCF seems to be heavily based on ODF): On page 697, for example, it
talks about a "standard zip file" yet gives no reference or definition to
what that is. Also, where the Zip specification is referenced, it points to
an version of the Zip APPNote that "has been unofficially corrected and
extended by Info-ZIP without explicit permission by PKWARE." Worst still,
the Zip specification they reference is almost 11years old and may possibly
be incompatible with current OSs implementations (I'm still waiting to hear
from Microsoft about which version of the Zip Appnote they actually
implemented; does anyone know which version Apple implemented in OSX?).
Anyway, IMO, that pretty much cancels out ODF as a potential reference for
Widgets. </rant>

Marcos

[1] http://www.w3.org/XML/2000/07/xml-packaging-charter
[2] http://www.pkware.com/documents/casestudies/APPNOTE.TXT
[3] http://en.wikipedia.org/wiki/Office_Open_XML
[4] http://www.noooxml.org/

-- 
Marcos Caceres
http://datadriven.com.au

Received on Sunday, 7 October 2007 06:02:12 UTC