Re: CBOR Tutorial from Ivan Herman on 2018-01-31 (public-publ-wg@w3.org from January 2018)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 31 Jan 2018 08:50:41 +0100
To: Leonard Rosenthol <lrosenth@adobe.com>
Cc: Baldur Bjarnason <baldur@rebus.foundation>, Romain <rdeltour@gmail.com>, "Dr. Wolfgang Schindler" <w.schindler@pons.de>, "Davis, Greg" <greg.davis@pearson.com>, Richard Wright <rkwright@geofx.com>, W3C Publishing Working Group <public-publ-wg@w3.org>
Message-Id: <D07363BB-1071-452A-A1EC-2CF746C746F2@w3.org>
I have not yet had the time to follow up on Baldur's links (thanks Baldur) but this message seems to be important for us: if we use Web Packaging also for packing up files that originate from some sort of a local file system (something that we discussed last Monday) _and_ we also want to compress the content, then we have to define our own compressing steps as part of the workflow? It may be as simple as using gzip, but this is something we would have to specify ourselves.

Leonard, did I get this right?

Cheers

Ivan

> On 31 Jan 2018, at 02:49, Leonard Rosenthol <lrosenth@adobe.com> wrote:
> 
>> Because of the ubiquity of compressed/gzipped HTTP responses and how the package stores responses, many text entries in a package will be stored compressed as binaries and not text.
>> 
> That's the key piece here for Web Packages vs PWP...Web Packages are expected (in the vast majority of use cases) to be delivered over an HTTP connection, which is itself compressed.  Also, there isn't concern about storing these on devices or quota-based storage.  However, for PWP, we expect that delivery may take place via other means, will certainly be stored by a user somewhere with limited storage (a device, a cloud storage system, etc.)
> 
> Leonard
> 
> 
> On 1/31/18, 1:42 AM, "Baldur Bjarnason" <baldur@rebus.foundation> wrote:
> 
>    There are a few informative discussions on this in the Web Packaging repository:
> 
>    * "Switch to binary format and more." https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWICG%2Fwebpackage%2Fissues%2F38&data=02%7C01%7Clrosenth%40adobe.com%7C60218998b49b452f1ab108d5681dbf35%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636529399344424810&sdata=1wx1hgZdmzjHC44Rs7UmB7KHq4wRlFE5SR6EuXI5270%3D&reserved=0
>    * "Inclusion of binary data into a text-based format" https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWICG%2Fwebpackage%2Fissues%2F10&data=02%7C01%7Clrosenth%40adobe.com%7C60218998b49b452f1ab108d5681dbf35%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636529399344424810&sdata=e8zQj1thB%2FIjI9P0unNlBKCSkQsqvsO6%2BKb9qjEu3uk%3D&reserved=0
> 
> 
>    Cited reasons (as far as I can tell):
> 
>    * The TAG proposal proved to be more complex to implement than anticipated. Formats like CBOR or DER have pre-existing implementations and are used in other standards so browsers have to support them anyway.
>    * A good portion of resources packaged are going to be binaries so a binary format would lead to considerable space savings over a text format
> 
>    Because of the ubiquity of compressed/gzipped HTTP responses and how the package stores responses, many text entries in a package will be stored compressed as binaries and not text.
> 
> 
>    There’s also a discussion of whether to switch away from CBOR to DER for more secure parsing and better error handling:
> 
>    * "Consider switching to DER-encoded ASN.1" https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWICG%2Fwebpackage%2Fissues%2F47&data=02%7C01%7Clrosenth%40adobe.com%7C60218998b49b452f1ab108d5681dbf35%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636529399344424810&sdata=oIbh43Bif3IYBODWtLF8Te5GIR9kQlfjx8jIthKS1AA%3D&reserved=0
> 
>    But based on that discussion it seems likely that they’ll stick to CBOR as that’s a simpler format.
> 
> 
>    Also relevant:
> 
>    “Explain why we're not using ZIP” https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWICG%2Fwebpackage%2Fissues%2F45&data=02%7C01%7Clrosenth%40adobe.com%7C60218998b49b452f1ab108d5681dbf35%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636529399344424810&sdata=I%2FiRBjO5FaN5wYYzuswCxHb%2FYXhaYL3N91RBW%2Be7gFk%3D&reserved=0
> 
>    - best
>    - Baldur Bjarnason
>      baldur@rebus.foundation
> 
> 
> 
>> On 30 Jan 2018, at 14:33, Ivan Herman <ivan@w3.org> wrote:
>> 
>> Romain,
>> 
>> that is true. But the question is: what is the advantage of using CBOR over simply transferring the original resource data (just like the original document of the TAG proposed)?
>> 
>> Ivan
>> 
>> ---
>> Ivan Herman
>> Tel:+31 641044153
>> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ivan-herman.net&data=02%7C01%7Clrosenth%40adobe.com%7C60218998b49b452f1ab108d5681dbf35%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636529399344424810&sdata=0XR163h%2BWmpRHQ0ppjQPTwtLTwaU0aT5KZEZUy4pZvA%3D&reserved=0
>> 
>> (Written on mobile, sorry for brevity and misspellings...)
>> 
>> 
>> 
>> On 30 Jan 2018, at 20:04, Romain <rdeltour@gmail.com> wrote:
>> 
>>> On 30 Jan 2018, at 19:11, Schindler Wolfgang Dr. <w.schindler@pons.de> wrote:
>>>> 
>>>> Am I right then that for a content document in HTML CBOR only means a 1:1 translation of UTF-8 codes into a binary format that would have exactly the same file size. If this is true, I’m afraid I don’t see (yet?) the connection to Web Packaging and the rationale for exchanging a human-readable format for a binary format. Or do I perhaps miss decisive goodies?
>>> 
>>> With CBOR and Jeffrey’s spec, you can *bundle* resources together and exchange them as one cohesive resource. Since a publication is a *collection* of multiple resources, we need a format to package them.
>>> 
>>> Romain.
>>> 
> 
> 
> 


----
Ivan Herman, W3C
Publishing@W3C Technical Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Wednesday, 31 January 2018 07:51:33 UTC