Re: Requirements for PDF as container for VC's from Leonard Rosenthol on 2020-11-10 (public-credentials@w3.org from November 2020)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Tue, 10 Nov 2020 12:48:57 +0000
To: "Raymond YEH (GOVTECH)" <Raymond_YEH@tech.gov.sg>, Bill Claxton <williamc@itr8.com>, Steve Capell <steve.capell@gmail.com>
CC: Adrian Gropper <agropper@healthurl.com>, "public-credentials@w3.org" <public-credentials@w3.org>, "Sin Yong LOH (IMDA)" <LOH_Sin_Yong@imda.gov.sg>, "Maillet LAURENT (GOVTECH)" <Maillet_LAURENT@tech.gov.sg>
Message-ID: <F413E885-61C0-4153-8D73-B399B9FF20F9@adobe.com>
I just want to correct one thing here…

> When Steve made that statement it is likely a comparison to PDF with metadata where it is impossible for the metadata to reflect the data represented on the PDF even if the issuer of the PDF wants to.
>It lacks the functionality to merge data and view into a rendered view like in OA.
>
Given a 100% compliant PDF processor, this is *NOT TRUE*.   PDF support the same ECMAScript (aka JavaScript) language, with an extended DOM, as used by the Open Web.  It also supports interactive form fields. The combination of those two technology can be used to “merge data into a rendered view”.

Leonard

From: "Raymond YEH (GOVTECH)" <Raymond_YEH@tech.gov.sg>
Date: Monday, November 9, 2020 at 1:29 AM
To: Bill Claxton <williamc@itr8.com>, Steve Capell <steve.capell@gmail.com>
Cc: Leonard Rosenthol <lrosenth@adobe.com>, Adrian Gropper <agropper@healthurl.com>, "public-credentials@w3.org" <public-credentials@w3.org>, "Sin Yong LOH (IMDA)" <LOH_Sin_Yong@imda.gov.sg>, "Maillet LAURENT (GOVTECH)" <Maillet_LAURENT@tech.gov.sg>
Subject: RE: Requirements for PDF as container for VC's

Bill Claxton,

From your question there seemed to be two ongoing concerns:

  1.  Misrepresentation of information on the document
  2.  Inability for parties to negotiate on the verifiable credentials

Misrepresentation of information on the document
The framework allows the issuer to render the a human friendly webview directly from underlying data. It certainly does not prevent one creating a view that is a poor representation of the underlying data if they are incompetent of fraudulent.

The very nature of allowing a renderer to present a different view to the human is an abstraction of the underlying data to the end user. This is so that the end user, who may not be comfortable with JSON, does not need to interact with the raw data directly.

However, as you have pointed out correctly, there is room for incompetent or fraudulent parties to misrepresent that data.

Taking your example further, one incompetent or fraudulent party could even issue a certificate in such manner:

  *   The renderer is hard-coded to display 4.0 GPA
  *   The document has a `gpa` field showing 3.9
  *   The document has a PDF attached to it showing someone else’s certificate with 3.8 GPA
  *   The transcript’s test results sum up to 3.7 GPA
Here you see a document that cannot even seemed to agree with itself what GPA the student has. I implore you to ask the question if this is a problem of the framework or simply because of a few problems unrelated to the choice of document framework:

  *   Bad inputs
  *   Bad controls
  *   Data replication
If the additional level of abstraction is still a concern and that the viewing party does not need a rendered version of the document, one could use a “headless” client for processing and interacting with the document without ever passing the document through the renderer.

Here I want to establish that:

  1.  The end user has a choice of using a human readable view or a headless program to perform the processing
  2.  The OA document format does not forbid raw document processing as you would with other types of VC
  3.  The problem with bad document likely lies with incompetent or fraudulent issuers

When Steve made that statement it is likely a comparison to PDF with metadata where it is impossible for the metadata to reflect the data represented on the PDF even if the issuer of the PDF wants to. It lacks the functionality to merge data and view into a rendered view like in OA.

Inability for parties to negotiate on the verifiable credentials
The document framework is a way to structure data and is not concerned if the content in it is factually correct.
Parties are free to negotiate through other channels on the content of the document.

As the document does not become publicly available and searchable, the issuing party cannot force the receiving party to submit that incorrect document if the receiving party do not wish to. The receiving party can also approach the issuing party to reissue the document should they have dispute on its content.

I think your example cites the SkillsFuture implementation of the facilities to store and route documents to the citizens directly. Document routing and negotiation are not part of the document framework, please do not confuse the document with the business processes around the document.

---



This Email is filed in GovTech DRMS<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrms.in.tech.gov.sg%2F%3Fxmail-id%3Ddccd597f-b8e5-4b43-916d-3692c22c8810&data=04%7C01%7Clrosenth%40adobe.com%7C60be6ae4b63d42892cc108d88478dad5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637405001904838484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9ONLNnHLFZjXVfi2ZD9wnJ0qjOUMIs3ayTZleMH%2Bv3M%3D&reserved=0>
[3S-id=dccd597f-b8e5-4b43-916d-3692c22c8810:a0a98837]


From: Bill Claxton <williamc@itr8.com>
Sent: Monday, 9 November 2020 11:56 AM
To: Steve Capell <steve.capell@gmail.com>
Cc: Leonard Rosenthol <lrosenth@adobe.com>; Adrian Gropper <agropper@healthurl.com>; public-credentials@w3.org; Raymond YEH (GOVTECH) <Raymond_YEH@tech.gov.sg>; Sin Yong LOH (IMDA) <LOH_Sin_Yong@imda.gov.sg>
Subject: Re: Requirements for PDF as container for VC's

Steve,

I think we should take this outside of the w3 group rather than inviting SG govt team to defend their work in this forum.  And if I'm not wrong, they would need to accept the public disclosure rules, before commenting.

As we are both implementers of OpenAttestation, there are better forums.

PS - I am not attacking the OpenAttestation framework, simply pointing out that relying parties need to be cautious in what they trust.  Also, I need to register my williamc@nextid.com<mailto:williamc@nextid.com> email address with this forum, rather than using my other company's email.
Regards, Bill Claxton (williamc@itr8.com<mailto:williamc@itr8.com>)
Facebook, Skype, MSN, Yahoo, Twitter, Flickr or Gmail: wmclaxton
Voice, Text or Whatsapp: +65-9012-4327
On 11/9/2020 11:45 AM, Steve Capell wrote:
Hi bill

Thanks for that comment.  Copying the Singapore tech lead in my response

I think I disagree with you because I can’t see how the renderer can show anything except what is in the credential because there is no other source of claim data

Raymond be able to offer a more specific response
Steven Capell
Mob: 0410 437854



On 9 Nov 2020, at 2:32 pm, Bill Claxton <williamc@itr8.com><mailto:williamc@itr8.com> wrote:
Following...

TL;DR - if you don't know the verification process, beware of what you trust.

1.  I agree with Adrian that "the machine representation [must be] the basis of trust".

2.  I disagree with Steve's assertion: "the Singapore govt open attestation framework... ensures the human viewer and machine reader always see the same data".  This is not true and is in fact a weakness of their verification mechanism.  They do not display the claims made in the JSON document so the relying party will not know if the visual and machine representation tally.

For example: the displayed presentation of a student degree could include mention of 'deans list' which is not supported by the certificate data or worse, it may state a 4.0 GPA when the certificate data indicates only a 3.8.  The relying party would not notice this unless they inspected the credential manually.  Furthermore, they is also a weakness is their issuance process in which individuals receive credentials in their wallet (ie- the Skills Passport) without confirming accuracy of the data (the recipient is never asked).  This opens the possibility of repudiation by the recipient in the case they are wrongly awarded a job or a grant on the basis of their 4.0 GPA and being on the dean's list, when the credential itself indicates otherwise.

This is all off-topic for whether PDFs make a good container for VCs, but I thought I should point out these important considerations regarding the relationship between the issued credential and its visual representation.
Regards, Bill Claxton (williamc@itr8.com<mailto:williamc@itr8.com>)
Facebook, Skype, MSN, Yahoo, Twitter, Flickr or Gmail: wmclaxton
Voice, Text or Whatsapp: +65-9012-4327
On 11/9/2020 9:26 AM, Steve Capell wrote:
So - after all that I think I’ve talked myself out of the idea of PDF as a VC container and I’m back to the idea of PDFs contained in VCs that are referenced by a QR on the PDF

There will be some that note the circular reference here.  It’s managed on the oracle as issuer side
- for low volume issue, the user uploads their PDF and we present a UI to position a generated QR and then they download the PDF with QR and use it as normal
- for high volume issue, the calling system first asks our API for the link URL / QR, then generates their own PDF before giving it back to us together with structured metadata

I can’t think of a better way to handle this enormously complex millions of stakeholders change management process ..
Steven Capell
Mob: 0410 437854



On 9 Nov 2020, at 12:20 pm, Steve Capell <steve.capell@gmail.com><mailto:steve.capell@gmail.com> wrote:
I should add that, at present, we don’t embed the VC in the PDF - it’s actually the other way around
- submitter provides data + PDF to oracle (issuer) who creates the VC and stores it as an encrypted file at a public (but unguessable) location : repository/uuid.json
- url plus secret key for decrypting the vc is embedded in the QR that goes on the PDF.  Theory is that if you are given the PDF with its data then you have rights to the digital version at the end of the QR link
- so we finish up with a PDF + QR that is the thing that is shared around but isn’t actually the authoritative VC. The PDF is really just the key to get the VC

Because there are so many hops and stakeholders in the supply chain, we can’t really Assume (at this stage in vc adoption maturity) that people will know what to do with a native digital vc.  So verification works for either humans (immature or low volume verifiers) or machines (mature or high volume verifiers) as follows
- humans just scan the QR and get taken to a verifier site that they trust (typically the exporting country regulator).  The site is a hosted verifier and gets the referenced VC (secret key in QR remember), validates it, And also presents the originally notarised metadata and PDF - asking the human to confirm that they one they are holding is the same as what the verifier is showing them
- the machine verifier is typically the importing regulator but could be any other high volume user along the way such as a bank.  The holder (presenter is maybe a better word) loads then PDF with QR to the verifier website such as a national trade single window.  The website extracts the QR , retrieves the VC, validates the VC, consumes VC metadata, checks that the hash of the presented PDF is the same as the hash of the Attached PDF in the VC

It may seem a bit clunky but the overriding issue is human change management.  Even a small economy will produce around 100 million cross border VCs in a year - each one going to any one of nearly 200 countries and potentially being touched by dozens of different roles.  Count the number of exporters and importers around the world and the possible combinations of specific entities across that 100m consignments is also in the 100’s of millions.  There’s no practical way to engage them all up front to change their normal business practice

So we like PDF as the carrier of the secret key and link and we like oracle issued VCs as the thing they link to - because this combination seems like the only practical way to take a first step to transform world trade

Hope that makes sense.  Happy to hear alternative suggestions

Steven Capell
Mob: 0410 437854



On 9 Nov 2020, at 11:52 am, steve capell <steve.capell@gmail.com><mailto:steve.capell@gmail.com> wrote:
thanks Leonard,

>Same way you protect the VC itself – Sign it!

Except that, in our use case, the VC issuer is the "oracle" (Exporting Government Authority) that the verifier (Importing Government Authority) trusts.  But the PDF is created by some other party in the exporting jurisdiction.  So, even if they sign it, the signing identity won't match the VC issuer identity.  Basically the verifier can check that it is signed and that the signature is valid - but it doesnt really help because the verifier doesnt know the PDF creator.

We are using the oracle as issuer VC model because to do otherwise would impose more complexity on both the regulated community in the exporting jurisdiction and on the verifier - to follow and confirm the trust chain via a set of linked / embedded VCs.  This is a better future state but seems too much of a leap to start with.

AS for not putting verification data on the human consumable form, that also imposes too much of a leap for such a diverse set of stakeholders in the internaitonal supply chain.

Thinking about it, I think the best way is for the oracle as issuer (exporting government) to notarise the PDF and include the hash of the pdf attachment in the VC.  then the verifier can just confirm that the hash of the PDF they are holding is the same as the hash of the PDF that the oracle notarised.

no?


On Mon, 9 Nov 2020 at 11:24, Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
> In the ideal world there are no PDFs that you need to trust, there is only data that the machine you trust can verify
>
You and I live in different ideal worlds 😊.


> The attack vector that I’m trying to figure out a simple way to mitigate is where Malik manipulates the PDF rendered information without touching the (linked or embedded) VC
>
Same way you protect the VC itself – Sign it!   Apply a standard DigSig to the PDF – most likely one that is PAdES compliant for better acceptance world-wide.

The other thing to consider is how technology/approaches such as the Content Authenticity Initiative (https://contentauthenticity.org/<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcontentauthenticity.org%2F&data=04%7C01%7Clrosenth%40adobe.com%7C60be6ae4b63d42892cc108d88478dad5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637405001904838484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4aKR2KXGA28AwKk5A25CFMd570noyhyBQ0qqlcZ63JQ%3D&reserved=0>) can also be applied here to establish provenance of the document and its content throughout its lifecycle.


> The community “learns” that a QR on a PDF means it is verifiable and when you see a green tick on scanning the QR, you look no further
>
Yes, this is why standards such as PAdES Part 6 are very clear about *not* putting any form of validation information in the content of the page.  Too easy for forge.


Leonard

From: Steve Capell <steve.capell@gmail.com<mailto:steve.capell@gmail.com>>
Date: Sunday, November 8, 2020 at 5:40 PM
To: Adrian Gropper <agropper@healthurl.com<mailto:agropper@healthurl.com>>
Cc: Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>>, Bill Claxton <williamc@itr8.com<mailto:williamc@itr8.com>>, "public-credentials@w3.org<mailto:public-credentials@w3.org>" <public-credentials@w3.org<mailto:public-credentials@w3.org>>
Subject: Re: Requirements for PDF as container for VC's

Yes I think the machine version has to be the basis of trust.  In the ideal world there are no PDFs that you need to trust, there is only data that the machine you trust can verify

The attack vector that I’m trying to figure out a simple way to mitigate is where Malik manipulates the PDF rendered information without touching the (linked or embedded) VC. Let’s say the VCs are a certificate of origin and a commercial invoice.  These documents can change hands many times between issuer and verifier.  Manufacturer, exporter, forwarder, carrier, financial service provider, bank, insurer, importer, customs agent, regulator - just a sampling of parties, any one of which could be Malik

Today, there is only paper and people may verify the bit of paper by calling up the issuer.  A time consuming and expensive process

In the end state, machines that people Trust algorithmically verify the VC data.  Much better.

It’s the intermediate state that worries me.  Sometimes people look at the paper (PDF) version of the data supposedly in the vc and sometimes machines verify the vc.  The community “learns” that a QR on a PDF means it is verifiable and when you see a green tick on scanning the QR, you look no further.  This is the danger window.  And a few spectacular attacks could kill the whole VC framework in an industry sector
Steven Capell
Mob: 0410 437854

On 9 Nov 2020, at 9:14 am, Adrian Gropper <agropper@healthurl.com<mailto:agropper@healthurl.com>> wrote:
By analogy with DID resolution, I can't imagine how anything other than the machine representation could be the basis of trust. (trying to avoid the master language).

It's then up to the verifier to confirm any human-consumable representation that is offered to them. They could send the PDF to a trusted resolver for confirmation or they could run a transform locally.

If the issuer wants to sign the human-consumable representation, it should be up to them to use a transform they trust and they could really be signing just the machine readable version anyway because they trust their transformer.

Can it be any other way?

Adrian

On Sun, Nov 8, 2020 at 3:52 PM Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
I realized after writing that my comment on #2 was not accessibility friendly.  It really should be “human consumable representation” since the content may be consumed in non-visible representations for the same purpose.

Leonard

From: Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>>
Date: Sunday, November 8, 2020 at 4:49 PM
To: Steve Capell <steve.capell@gmail.com<mailto:steve.capell@gmail.com>>
Cc: Bill Claxton <williamc@itr8.com<mailto:williamc@itr8.com>>, "public-credentials@w3.org<mailto:public-credentials@w3.org>" <public-credentials@w3.org<mailto:public-credentials@w3.org>>
Subject: Re: Requirements for PDF as container for VC's
Resent-From: <public-credentials@w3.org<mailto:public-credentials@w3.org>>
Resent-Date: Sunday, November 8, 2020 at 4:48 PM

Steven – we have to be careful to not to conflate issues…

1 – Can a machine determine if the data being used for presentation directly matches that used as “data”?   That problem can be solved by either have a single set of data that is used in both cases – though I am not aware of any situation where that is the case – in all cases there is a TRANSFORM from one to the other.  So if you aren’t using the same data, then you need a way to “connect the dots” so that it is clear what part of the presentation matches what part of the data.  PDF supports that using semantic tagging of content (as I showed in my presentation).

2 – Can a human determine if the data is correct?  Yes, by having a human visible representation the human verified that what they see is what they expect.

3 – Determine which representation – human or machine – is the “master”.  In the eInvoicing standards of the EU, the machine readable XML is the master copy and the PDF presentation si just that – human readable presentation.  I would expect that in our VC cases the same would be true.  No?

Leonard

From: Steve Capell <steve.capell@gmail.com<mailto:steve.capell@gmail.com>>
Date: Sunday, November 8, 2020 at 4:09 PM
To: Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>>
Cc: Bill Claxton <williamc@itr8.com<mailto:williamc@itr8.com>>, "public-credentials@w3.org<mailto:public-credentials@w3.org>" <public-credentials@w3.org<mailto:public-credentials@w3.org>>
Subject: Re: Requirements for PDF as container for VC's

As you all probably already know, the Singapore govt open attestation framework has a nice way of separating an issuer defined tenderer from the vc payload - which ensures the human viewer and machine reader always see the same data

Of course the problem is that the verifier needs a link to the rendered view and that is often a QR on a PDF. But in that case there is no guarantee that the data on the PDF page is the same as the data in the linked VC - other than a human “yep, they look the same” eyeball.

I’m not a PDF expert but I note that even the EU / German e-invoicing franework seems to have the same problem.  The XML data is attached to the PDF as meta data but there’s nothing to guarantee that the values in the xml (eg amounts and bank details) are the same as what’s in the PDF view.  That opens up rather obvious avenues for fraud.

Have I misunderstood? Does someone have a solution that can ensure that PDF rendered data is the same as PDF attached metadata ?
Steven Capell
Mob: 0410 437854

On 9 Nov 2020, at 2:08 am, Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
Bill, thanks for sharing.  I read your evaluation link and there is a lot of good stuff there.   However, I am surprised that your concerns about PDF are around layout and rendering – since in that case, PDF is a better solution since the layout is defined by the issuer and the renderer simply follows the rules of ISO 32000.  So there is never a question about the rendering not matching what the issuer desires.

However, I think you have an item later on that is more relevant and we should consider the importance of:
> How do you assure that what’s in the layout matches the JSON data?

Although I would change that to not be the layout (since the layout, or what I would call presentation) but instead be the data presented.

Leonard

From: Bill Claxton <williamc@itr8.com<mailto:williamc@itr8.com>>
Date: Sunday, November 8, 2020 at 8:35 AM
To: "public-credentials@w3.org<mailto:public-credentials@w3.org>" <public-credentials@w3.org<mailto:public-credentials@w3.org>>
Subject: Re: Requirements for PDF as container for VC's
Resent-From: <public-credentials@w3.org<mailto:public-credentials@w3.org>>
Resent-Date: Sunday, November 8, 2020 at 8:32 AM

Kostas, hi -

We spoke recently about this and I'm following the thread to see what the community thinks.  Happy that you're getting lots of input.

I thought you may want to review my recent article "Evaluating Decentralised Identity Projects<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwmclaxton.medium.com%2F&data=04%7C01%7Clrosenth%40adobe.com%7C60be6ae4b63d42892cc108d88478dad5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637405001904838484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mu47HpIeJ8nIaGRVUdDuesjpFFN5xrFCi3DhKpRn6Cg%3D&reserved=0>".  It's a sort of reviewer's guide for decentralised identity apps.  In particular, I would draw your attention to the point: "Have you separated layout from the rendering application, so that 3rd-parties (including verifiers) can render your certs?"  I intend to write a follow up about presentation layer and mention PDF encapsulation as an alternative, but one I am not much in favor of, for the reasons we discussed.

PS - For anyone else reading this thread, I was a PDF evangelist back in the early days of v4 and v5.  I am familiar with the encapsulation methods and indeed used them for a National Archives project in Singapore.
Regards, Bill Claxton (williamc@itr8.com<mailto:williamc@itr8.com>)
Facebook, Skype, MSN, Yahoo, Twitter, Flickr or Gmail: wmclaxton
Voice, Text or Whatsapp: +65-9012-4327
On 11/8/2020 5:17 PM, Kostas Karasavvas wrote:
Hi Leonard,

Thanks for initiating this. I have been thinking of this as well.

On Sun, Nov 8, 2020 at 3:23 AM Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
Kicking off some discussions here – I’d like to start by putting down what I think are some “primary” requirements for the use of PDF in this context.  Am I missing anything?  Do you disagree with any of these?   Feedback welcome!!

## Requirements

- Shall store the VC in native JSON-LD (w/optional compression)
- VC should be in an easily accessible location (for both reading & writing)

Agree. And JSON-LD is what we intend to use. There could also be a 'serialization/type' option to specify the serialization to accommodate other use cases (XML? CBOR-LD?), if need be.

- Should require no language changes to PDF (except "metadata"-like values)
    - implies compatibility with both PDF 1.7 and 2.0

Agree. I assume that PDF 1.7 implies compatibility with older versions as well?

- Shall be usable in conjunction with standard PDF Signing/Certification

Makes sense. I have questions here wrt the PDF spec and how signing is happening (the 'hole' in the file approach) but these are for later.

Regards,
Kostas



Leonard


--
Konstantinos A. Karasavvas
Software Architect, Blockchain Engineer, Researcher, Educator
https://twitter.com/kkarasavvas<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fkkarasavvas&data=04%7C01%7Clrosenth%40adobe.com%7C60be6ae4b63d42892cc108d88478dad5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637405001904848437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=lq9vGeOHFi38zndwIFpz8AgriFhK%2F2NX0cBMeQDtZrQ%3D&reserved=0>





--
Steve Capell
Received on Tuesday, 10 November 2020 12:49:17 UTC