Re: VC API: handling large documents client to server from Julien Fraichot on 2022-03-29 (public-credentials@w3.org from March 2022)

From: Julien Fraichot <Julien.Fraichot@hyland.com>
Date: Tue, 29 Mar 2022 15:17:53 +0000
To: Leonard Rosenthol <lrosenth@adobe.com>, Manu Sporny <msporny@digitalbazaar.com>, "public-credentials@w3.org" <public-credentials@w3.org>
Message-ID: <SA1PR13MB56363D9532FD23F423B8CE34901E9@SA1PR13MB5636.namprd13.prod.outlook.com>
I have spent some time implementing hashlinks and while the gain document size is great, I share Leonard’s concerns.
I also find that having to host files for as long as certificates need to live (a lifetime or more – side question, can you inherit a VC?) undermines decentralization of the data, which may or may not be the implementer’s goal.

With the hashlink approach there is also a small latency when updating the URL (since there is first a decode to retrieve the initial URL and then an update to the browser img src as we found it). Would a regular URL with such format work: https://[domain]/[path-to-assets]/[multihash-scheme]/[multibase-scheme]/[encoded-filename].[extension<https://[domain]/%5bpath-to-assets%5d/%5bmultihash-scheme%5d/%5bmultibase-scheme%5d/%5bencoded-filename%5d.%5bextension>]? The reader could know how to verify the content integrity of the image, but it would be seamless for the browser and the end user.

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Sunday, 13 February 2022 at 10:09
To: Manu Sporny <msporny@digitalbazaar.com>, public-credentials@w3.org <public-credentials@w3.org>
Subject: [EXTERNAL] [jfraichot@learningmachine.com] Re: VC API: handling large documents client to server
CAUTION: This email originated from outside of Hyland. Do not click links or open attachments unless you recognize the sender and know the content is safe.

The one downside with the approach of keeping stuff external is that it opens up a “phone home” attack (ala “tracking pixels”), enabling the issuer of the VC to obtain information anyone accessing it – which many consider a security/privacy problem.   It’s one reason we have focused on embedding as much information into our C2PA manifests – favoring self-containment over size considerations.

Leonard

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Thursday, February 10, 2022 at 4:41 PM
To: public-credentials@w3.org <public-credentials@w3.org>
Subject: Re: VC API: handling large documents client to server
On 2/10/22 3:01 AM, Julien Fraichot wrote:
> Basically when calling a verification API from a (browser) client, there
> might be times where the documents could be quite large (few MBs). I am
> wondering if there are some strategies to reduce the payload that would
> also be standard when dealing with a VC API complying service?

If you have a VC that is several MB in size, I would expect it to struggle in
the ecosystem. Yes, they are legal, in the same way that attaching a 250MB
slide deck to an email is legal -- while the SMTP protocol allows for it, most
mail servers will reject the message as too large.

Typically, these large VCs happen because people are embedding base-encoded
images directly into a VC. Instead, VC creators should consider modelling
their data differently -- e.g., use a hashlink, or some other way of creating
a cryptographic hyperlink:

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fhtml%2Fdraft-sporny-hashlink&amp;data=04%7C01%7Clrosenth%40adobe.com%7Ca2492c0f35044fc40ef008d9eca32591%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637801008759634672%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=OALgcIOZcG9fPlx4vUa3kXl5Zok7BldgyAEBlmOZuoM%3D&amp;reserved=0

> I am researching gzipping on the client and tried more exotic approaches to
> no avail, so I’d be willing to hear what the people have thought on the
> matter.

You get gzip compression during transmission (more or less) for free these
days, but that's not really going to save you given that you're probably
base-encoding the raw binary data. Quite counter-intuitively, doing that makes
using gzip expand the file size instead of reducing it:

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F38124361%2Fwhy-does-base64-encoded-data-compress-so-poorly&amp;data=04%7C01%7Clrosenth%40adobe.com%7Ca2492c0f35044fc40ef008d9eca32591%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637801008759634672%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=8WAzFVAK2y1yQ0l2%2BwHINbuZBluLbQtMAfoD5F9IM8c%3D&amp;reserved=0

This is another reason the JOSE/JWT stack, when used with VCs, harm wire-level
protocols -- everything is base64 encoded, and thus it effectively destroys
any ability to compress data on the wire.

Typical solutions to this problem require that you put the binary data outside
of the VC, if at all possible. This works well for common static images such
as logos. It is also possible to split the VC into two VCs... one with the
machine-readable data from the issuer (with a digital signature) and one with
the image data from any source (without a digital signature, since, if
hashlinked, the signature will verify the validity of the image data). That
latter approach can be more privacy preserving AND more complex than many
might feel is necessary.

Selective disclosure schemes (such as BBS+) are another way to deliver a
subset of the information to a verifier without having to send the image
payload data.

I expect this to be an active area of innovation for the next few years with a
few proposals on standard design patterns that all industries could use. This
problem appears most often with identification cards that have biometric
images embedded in them.

-- manu

--
Manu Sporny - https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmanusporny%2F&amp;data=04%7C01%7Clrosenth%40adobe.com%7Ca2492c0f35044fc40ef008d9eca32591%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637801008759634672%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=dJNSXP7rr%2FHgiDFfLF%2BsSP7JHgHVCb4%2BS78toWMjhrY%3D&amp;reserved=0
Founder/CEO - Digital Bazaar, Inc.
News: Digital Bazaar Announces New Case Studies (2021)
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.digitalbazaar.com%2F&amp;data=04%7C01%7Clrosenth%40adobe.com%7Ca2492c0f35044fc40ef008d9eca32591%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637801008759634672%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=UGX1Fd%2FP7etT%2Bc6oxSRMJ%2BaWeEKcw%2FCn5Ww%2BbixOYnw%3D&amp;reserved=0


-----------------------------------------  Please consider the environment before printing this e-mail -----------------------------------------  

CONFIDENTIALITY NOTICE: This message and any attached documents may contain confidential information from Hyland Software, Inc. The information is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, or an employee or agent responsible for the delivery of this message to the intended recipient, the reader is hereby notified that any dissemination, distribution or copying of this message or of any attached documents, or the taking of any action or omission to take any action in reliance on the contents of this message or of any attached documents, is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail or telephone, at +1 (440) 788-5000, and delete the original message immediately. Thank you.
Received on Tuesday, 29 March 2022 15:18:22 UTC