Re: Putting pdf data into VCs from Arshad Noor on 2020-11-03 (public-credentials@w3.org from November 2020)

From: Arshad Noor <arshad.noor@strongkey.com>
Date: Mon, 2 Nov 2020 20:00:26 -0800
To: public-credentials@w3.org
Message-ID: <23f0c194-681a-4b9b-5280-daf852ef0a16@strongkey.com>
Hello everyone,


I just joined this community last week with the notion of being a lurker 
for some months, and learn more about what's going on before speaking. 
My understanding of VCs is relatively basic currently, but am hoping to 
catch up over time.


However, this thread caught my attention since the primary area in which 
I focus is the use of applied cryptography to protect data: encryption, 
digital signatures, key-management, etc. I have some experience in 
having protected PDFs in the past - nearly 50M PDF files as a matter of 
fact - and can provide some suggestions that may be helpful.


Taking the University use-case as an example, it might be possible for 
the University to do the following:

  * Generate the credentials in PDF format;
  * Digitally sign it with a private key, with that key's digital
    certificate bound to the University;
  * Store the signed PDF somewhere  in their private cloud;
  * Put the metadata of the PDF - perhaps, unique id,filename, URL,
    anonymized recipient data, digital signature, etc. - and put it on
    the blockchain;
  * Provide a VC  with a token carrying some policy - eg: number of
    times the file may be downloaded, expiry date-time, etc. - and a
    pointer to the record on the blockchain;
  * The intended recipient of the VC uses the token to locate the URL of
    the PDF in the blockchain record, accesses the URL and presents the
    token to the "gate-keeper" of the private cloud. Based on the
    validity of the VC and the token, the signed PDF is provided for
    download.

By using this flow:

  * The authenticity of the PDF is established by the digital signature
    (verified by the digital certificate bound to the University);
  * The existence of such a document - its origin and associated
    metadata - are established by the presence of the metadata record on
    the blockchain;
  * The integrity of the PDF is preserved through the digital signature
    _and_ metadata of the PDF on the blockchain.

For the healthcare use-case, the only thing that needs to be added to 
the flow described above is, to encrypt the PDF and ensure that the 
token in the VC enables the recipient to decrypt the PDF when the 
document is accessed.


Encrypting and digitally signing large volumes of files is a non-issue - 
this open-source library as well as the architecture paper describes how 
the 50M PDFs were protected almost a decade ago: 
https://sourceforge.net/projects/skce/.


Hope this helps.


Regards,


Arshad Noor
StrongKey




On 11/2/20 5:39 PM, Kristina Yasuda wrote:
> Hi all, Kostas, Adrian,
>
> Thank you very much for the responses! Was insightful to find out that 
> pdf as a container is a usecase not limited to education sector.
>
> I agree that displaying metadata makes more sense than reproducing a 
> pdf from a VC. With decentralized storages, storing files while 
> putting encryption protection was one of the challenges.
>
> This raises an intersting question on how to integrate VCs with 
> existing data formats (pdf being one) - which to use as a container: 
> VC or existing data format. One option would not be interoperate with 
> another..
>
> Kindest Regards,
> Kristina
>
> ------------------------------------------------------------------------
> *差出人:* Adrian Gropper <agropper@healthurl.com>
> *送信日時:* 2020年10月30日 2:46
> *宛先:* Kostas Karasavvas <kkarasavvas@gmail.com>
> *CC:* Kristina Yasuda <Kristina.Yasuda@microsoft.com>; W3C Credentials 
> CG (Public List) <public-credentials@w3.org>; Zhen Chien Chia 
> <zhchia@microsoft.com>
> *件名:* Re: Putting pdf data into VCs
> The healthcare use-case is around prescriptions: 
> https://w3c.github.io/did-use-cases/#prescriptions 
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw3c.github.io%2Fdid-use-cases%2F%23prescriptions&data=04%7C01%7CKristina.Yasuda%40microsoft.com%7Cfa70a06ed411487cb4c808d87c3291af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637395903942187568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PLUcmqfZQfUExYO%2B1ZEJVyilAHEcXXYQK9m30daDS3A%3D&reserved=0> 
>
>
> ePrescribing is a major market with tremendous privacy and public 
> health issues and a lot of traffic as well. The current ePrescribing  
> system in the US is inferior to the paper system it replaced.
>
>   * It reduces, if not effectively eliminating, the patient's ability
>     to shop around.
>   * It has also led to a federation that excludes many innovative
>     digital health records solutions.
>   * The network that manages prescriptions (SureScripts) is opaque and
>     inaccessible to patients
>   * SureScripts restraint of trade practices is under investigation by
>     the Federal Government.
>   * SureScripts is, in effect, the identity provider for 330 Million
>     people and is now marketing itself as a data broker beyond just
>     prescriptions.
>   * ePrescribing of controlled substances (EPCS) is a major
>     application in itself and needs strong, Federal-grade credentials
>     and non-repudiable signatures.
>
> PDF-based ePrescriptions could be designed to fix all of the above. 
> The Free / libre HIE of One Trustee 
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhieofone.com%2F&data=04%7C01%7CKristina.Yasuda%40microsoft.com%7Cfa70a06ed411487cb4c808d87c3291af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637395903942197561%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BtHb7JtIBpbp8qa2mpouQjK7AONNMggJOG3bZn84dTY%3D&reserved=0> 
> project uses ePrescribing as a core demonstration. We need help 
> adopting the SSI standards including VC, non-repudiable signatures, 
> and timestamps.
>
> Adrian
>
> On Thu, Oct 29, 2020 at 5:57 AM Kostas Karasavvas 
> <kkarasavvas@gmail.com <mailto:kkarasavvas@gmail.com>> wrote:
>
>     Hi Kristina / all,
>
>     First off, to my knowledge there is no standardized way to do this.
>
>     Indeed this is different from using the PDFs as the container.
>     This is something we have been thinking and although I don't have
>     any answers, here are my thoughts.
>
>     1) The institutions that provide the PDFs could also provide the
>     machine-readable metadata (this is what we optionally do with the
>     PDF container as well). There are decentralized storage solutions
>     that won't introduce a centralized PoF (with the assumption that
>     the storage solution is self-sustainable to guarantee long-term
>     usage) so the pointer idea could be feasible without compromising
>     decentralization. Although less practical, several such solutions
>     could be combined for an even more resilient solution. We have
>     frozen investigations on this though. Did not seem that important
>     for the clients.
>
>     2) Yes, this can be done but the PDF won't be identical to the PDF
>     that the institutions provide unless they are constructed in an
>     identical way. This is not practical since each institution uses
>     their own methods for that and a lot just scan the physical
>     degrees!  If they are not identical I don't see the point of
>     creating the PDF in the first place. Why not display the metadata
>     in HTML?
>
>     That is one of the reasons we opted for the PDF as a container. It
>     represented a digital version of their physical degrees; and a lot
>     of these institutions were issuing the PDF degrees anyway. The
>     physical degrees typically have several mechanisms to secure the
>     document validity (like holograms). For the digital version we
>     anchor it into a blockchain. When validating the PDF you will see
>     the PDF (identical to the physical one) and the blockchain
>     verification details. This approach has the benefit that the
>     integrity of the exact PDF document that the institution has
>     created is becoming tamperproof. The issuing institutions do what
>     they always did wrt storing/disseminating their certificates (with
>     potential improvements).
>
>     We add certain information into the PDF to achieve this. The PDF
>     itself (the visual representation) becomes a self-verifiable
>     document and while we currently use an adhoc json format we do
>     intend to move towards VCs. The idea is that a (universal) VC
>     wallet would look for the VC in the metadata of the PDF and
>     validate it accordingly.
>
>     Finally, what one could also do is include a base64 (or equiv.) of
>     the whole PDF inside the VC. Thus, VC is the container and a
>     specific visual representation is attached. I believe this beats
>     the purpose of having a PDF in the first place but I would be open
>     to counter arguments. Maybe it will work in your use case but from
>     my experience that would be contrary to the workflow that these
>     institutions typically have.
>
>     Hope the above are of some help,
>     Kostas
>
>
>
>     On Wed, Oct 28, 2020 at 9:03 PM Kristina Yasuda
>     <Kristina.Yasuda@microsoft.com
>     <mailto:Kristina.Yasuda@microsoft.com>> wrote:
>
>         Hi all!
>
>         Many educational institutions issue credentials (transcripts,
>         graduation certificates, etc.) in pdf format, and we have
>         faced an question of how to create VCs using claims in pdfs.
>         Reaching out to the community is anyone has faced a similar
>         issue and if there are standardized way/best practices to:
>         1) put data from pdfs into VCs while keeping integrity without
>         relian on centralized party? One option could be storing pdf
>         files in centralized/decentralized servers and including a
>         pointer to a file in a VC, but that would introduce a certain
>         level of centralization.
>         2)  to reconstruct pdfs from claims in VCs?
>
>         For example Modeling Educational Verifiable Credentials report
>         (https://w3c-ccg.github.io/vc-ed-models/#biblio-obs-are-vcs
>         <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw3c-ccg.github.io%2Fvc-ed-models%2F%23biblio-obs-are-vcs&data=04%7C01%7CKristina.Yasuda%40microsoft.com%7Cfa70a06ed411487cb4c808d87c3291af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637395903942197561%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zH2EnH8OPUW6m2bFKXkif1pl%2F9E1eCyCN2FffaXXIK8%3D&reserved=0>)
>         in section 1.5.3.1 shows the example of including pdf format
>         in a VC, but how can the verifier reproduce a pdf record from
>         the set of values in payload.data?
>
>         This is a little different from using pdfs as a container,
>         rather including information from pdfs into VCs.
>
>         Thank you very much!
>         Kristina
>
>
>
>     -- 
>     Konstantinos A. Karasavvas
>     Software Architect, Blockchain Engineer, Researcher, Educator
>     https://twitter.com/kkarasavvas
>     <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fkkarasavvas&data=04%7C01%7CKristina.Yasuda%40microsoft.com%7Cfa70a06ed411487cb4c808d87c3291af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637395903942207558%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=g40BnUQeXVcIrKzHaCiAViHN%2BazwGB3MGfpOEsF80MA%3D&reserved=0>
>
Received on Tuesday, 3 November 2020 04:00:46 UTC