Re: CBOR Tutorial from Brady Duga on 2018-02-02 (public-publ-wg@w3.org from February 2018)

From: Brady Duga <duga@google.com>
Date: Fri, 2 Feb 2018 08:11:14 -0800
To: Leonard Rosenthol <lrosenth@adobe.com>
Cc: Romain <rdeltour@gmail.com>, Laurent Le Meur <laurent.lemeur@edrlab.org>, Baldur Bjarnason <baldur@rebus.foundation>, Ivan Herman <ivan@w3.org>, "Schindler Wolfgang Dr." <w.schindler@pons.de>, "Davis, Greg" <greg.davis@pearson.com>, Ric Wright <rkwright@geofx.com>, W3C Publishing Working Group <public-publ-wg@w3.org>
Message-ID: <CAH_p_eUZt-9TLYXvOoaENmgCfQhNkbgwkStiFPa8QXZkqK1rSg@mail.gmail.com>
Sorry, I may well be wrong, happy to understand what I am missing! As I
said, I am not a CBOR expert, I have just scanned the RFC. Looking at that,
this seems entirely possible. Perhaps there is a further encoding step I
overlooked? In any case, I don't consider this the weeds, rather it is
foundational to evaluating CBOR. You had said: ' It is not a good format
for “off the web” exchange (IMO)'. The reason you gave was 'CBOR, by
itself, does not support random access.' So my question is, does CBOR
really not support random access, or can random access be enabled with a
single pass over the data to generate an index? It seems it could be done,
and wouldn't be much more work than, say, verifying a checksum.

On Fri, Feb 2, 2018 at 12:27 AM, Leonard Rosenthol <lrosenth@adobe.com>
wrote:

> If I understand you correct, Brady, that won’t work given how CBOR
> encoding works.  (but now we’re in the weeds..)
>
>
>
> *From: *"Brady com>" <duga@google.com>
> *Date: *Thursday, February 1, 2018 at 8:00 PM
>
> *To: *Leonard Rosenthol <lrosenth@adobe.com>
> *Cc: *Romain <rdeltour@gmail.com>, Laurent Le Meur <
> laurent.lemeur@edrlab.org>, Baldur Bjarnason <baldur@rebus.foundation>,
> Ivan Herman <ivan@w3.org>, "Schindler Wolfgang Dr." <w.schindler@pons.de>,
> "Davis, Greg" <greg.davis@pearson.com>, Ric Wright <rkwright@geofx.com>,
> W3C Publishing Working Group <public-publ-wg@w3.org>
> *Subject: *Re: CBOR Tutorial
>
>
>
> It doesn't provide random access, but it still allows it. Specifically, if
> you have an array of maps of strings to arrays of strings, you can generate
> an index that will contain the first string in each array and read them
> directly without loading any of the other structures. Yes, you have to
> build the index, but once done, with no modifications or other
> infrastructure, you can read the values directly from the blob.
>
>
>
> On Thu, Feb 1, 2018 at 11:45 AM, Leonard Rosenthol <lrosenth@adobe.com>
> wrote:
>
> > it just doesn't supply an index
>
> >
>
> Which implies that it, by itself, does not support random access.
>
>
>
> As I mentioned, you can build all sorts of infrastructure around it to
> allow it – but as an encoding format, it is not designed for that.
>
>
>
> Leonard
>
>
>
> *From: *"Brady com>" <duga@google.com>
> *Date: *Thursday, February 1, 2018 at 8:20 PM
> *To: *Leonard Rosenthol <lrosenth@adobe.com>
> *Cc: *Romain <rdeltour@gmail.com>, Laurent Le Meur <
> laurent.lemeur@edrlab.org>, Baldur Bjarnason <baldur@rebus.foundation>,
> Ivan Herman <ivan@w3.org>, "Schindler Wolfgang Dr." <w.schindler@pons.de>,
> "Davis, Greg" <greg.davis@pearson.com>, Ric Wright <rkwright@geofx.com>,
> W3C Publishing Working Group <public-publ-wg@w3.org>
> *Subject: *Re: CBOR Tutorial
>
>
>
> Sorry, not an expert in CBOR, but it does seem like it allows random
> access, it just doesn't supply an index. But a client that wanted random
> access to the items in the package it could generate an index easily
> enough, either as a one-time pass over the complete package or during
> download. Granted, you may have to sequentially run through every byte in
> the data to generate that index due to indefinite strings and arrays, so
> might not be the fastest operation in the world, but again it is a
> one-time, fairly easy parse.
>
>
>
> On Thu, Feb 1, 2018 at 2:22 AM, Leonard Rosenthol <lrosenth@adobe.com>
> wrote:
>
> So be careful to not confuse the data model (aka manifests, bundles etc.)
> and then encoding (CBOR in this case).
>
>
>
> CBOR, by itself, does not support random access.
>
>
>
> However, a data model (such as the WebPack work) that is then encoded into
> CBOR can provide support for random access (or at least a limited set of
> it, based on the model).
>
>
>
> So yes if we were to adopt WebPack, we would achieve a level of random
> access.  (NOTE: if we do “object” compression, we may have to do some work
> on the manifest).
>
>
>
> Leonard
>
>
>
> *From: *Romain <rdeltour@gmail.com>
> *Date: *Thursday, February 1, 2018 at 2:18 PM
> *To: *Leonard Rosenthol <lrosenth@adobe.com>
> *Cc: *Laurent Le Meur <laurent.lemeur@edrlab.org>, Baldur Bjarnason
> <baldur@rebus.foundation>, Ivan Herman <ivan@w3.org>, "Schindler Wolfgang
> Dr." <w.schindler@pons.de>, "Davis, Greg" <greg.davis@pearson.com>, Ric
> Wright <rkwright@geofx.com>, W3C Publishing Working Group <
> public-publ-wg@w3.org>
> *Subject: *Re: CBOR Tutorial
>
>
>
>
>
> On 31 Jan 2018, at 17:58, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>
>
>
> CBOR is a great exchange format for “over the wire” data exchange.   It is
> not a good format for “off the web” exchange (IMO)
>
>
>
>
>
> Primarily because its optimized for streaming and not random access.
>
>
>
> Random access is always a better model for data processing but assumes
> that you have all the data already present (as would you “off the web”).
> However, when streaming across a network/the web, you don’t always have the
> option (yes, there is byte range requests but they aren’t supported in all
> modern network configs, eg. load balancers).
>
>
>
> My understanding is that random access depends on the actual data model
> being encoded in CBOR?
>
>
>
> If I understand correctly, in the bundling spec, the index can be parsed
> first, and gives you pointers to each individual request/response pair,
> which effectively enables random access.
>
> Random access is btw stated as an essential requirement for the packaging
> spec: https://tools.ietf.org/html/draft-yasskin-webpackage-
> use-cases-00#section-3.1.5
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.ietf.org%2Fhtml%2Fdraft-yasskin-webpackage-use-cases-00%23section-3.1.5&data=02%7C01%7Clrosenth%40adobe.com%7Cd327d8c0b59840ef04a908d56950983d%7C71f1da39c0a84d5a8d88a67b23c30bf4%7C0%7C0%7C636530717246387501&sdata=1NyJIIdxm%2FW0M8JbNVCixug9iLGWd7GgCT2G9LdOnuo%3D&reserved=0>
>
>
>
> Am I missing something?
>
>
>
> Romain.
>
>
>
>
>
Received on Friday, 2 February 2018 16:11:40 UTC