Re: Using VCs for source code provenance: embedding, subject identification, and claim strength. from Michael Herman (Trusted Digital Web) on 2026-04-01 (public-credentials@w3.org from April 2026)

From: Michael Herman (Trusted Digital Web) <mwherman@parallelspace.net>
Date: Wed, 1 Apr 2026 01:34:13 +0000
To: Adrian Gropper <agropper@healthurl.com>, Bob Wyman <bob@wyman.us>
CC: "W3C Credentials CG (Public List)" <public-credentials@w3.org>
Message-ID: <LV5PR13MB7524634E3B79552EEE2D304EC350A@LV5PR13MB7524.namprd13.prod.outlook.com>
Why not:

  1.
Call it Processing History that logs every unique review, test case (most recent execution), human code changes, digital code change, fork checkin, PR, etc.
  2.
Needs to work uniformly across all coding languages and document file formats... persist the VC in the language/document mechanism for comments.
  3.
I think VC proof sets might work well here: https://hyperonomy.com/2026/03/26/sdo-verifiable-trust-circles-vtcs-using-vc-proof-sets-web-7-0/

  4.
This approach ideally requires all code to have a Processing History including prompts, test cases, issues, ...

...or something like that.

Michael
Web 7.0

Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Adrian Gropper <agropper@healthurl.com>
Sent: Tuesday, March 31, 2026 7:15:46 PM
To: Bob Wyman <bob@wyman.us>
Cc: W3C Credentials CG (Public List) <public-credentials@w3.org>
Subject: Re: Using VCs for source code provenance: embedding, subject identification, and claim strength.

Modern LLMs increasingly depend on reasoning rather than being limited to their training data. In medicine and probably most other domains, research consistently shows that more generic training and knowledge gives better results than more fine-tuning for any particular domain.

Also, verification requires time, money, and structure, especially if humans are still involved. The generic LLMs are changing much faster than any current standardization or verification process I can think of (but I am not an expert in verification).

I see a declining future for any proprietary code. It's just too easy for LLMs to create hidden issues (bias, back doors) in proprietary code.

Simply put, verified component libraries are nice-to-haves but I can't see them being economically viable for much longer.

Adrian

On Tue, Mar 31, 2026 at 9:00 PM Bob Wyman <bob@wyman.us<mailto:bob@wyman.us>> wrote:
Adrian,
You're right that LLMs are increasingly effective at verifying open source code, and that "trust to verify" changes the economics of review. But LLM verification still requires a standard to verify against. The approach I'm working with builds a curated library of verified computational components — each one traceable to a specific RFC section or named design pattern, carrying compliance tests, a proof of correctness, and a documented list of known limitations. An LLM directed to draw first from this library, and to fall back on its own synthesis only when no verified component exists, works within a constrained and grounded search space for common cases rather than across the full range of its training data. Without such discipline, LLM verification is circular: the model checks code against what it has seen before, which includes both good and bad implementations with no way to distinguish them. And if LLMs are already better than human reviewers without this discipline, they will be better still with it — the library gives them a verified knowledge base to work from rather than a statistical aggregate.

The provenance tagging in the original message is a natural extension of that library discipline rather than a separate mechanism. An LLM directed to produce a citable derivedFrom reference is constrained to derive from something real and checkable. WAG-tagged code is not just flagged for review — it signals that the generation process lacked a verified basis, which is actionable for the next iteration.

The library approach also applies beyond open source. Proprietary code, domain-specific logic, and novel protocol implementations cannot be verified by searching the web for real-world experience. A verified component library is the mechanism for making that knowledge reliably available to LLMs.

bob wyman

On Tue, Mar 31, 2026 at 8:41 PM Adrian Gropper <agropper@healthurl.com<mailto:agropper@healthurl.com>> wrote:
I think this is rapidly becoming irrelevant. When code is open source, dollar for dollar, coding LLMs are already better than human reviewers and code provenance becomes less important. At least for open source code, we are moving from trust to verify.

Coding LLMs also benefit from access to the context in which the code will be used and the ability to search the web for real-world experience with the code as well as improvements and modifications applicable to open source code to further reduce the attack surface.

Adrian

On Tue, Mar 31, 2026 at 7:44 PM Bob Wyman <bob@wyman.us<mailto:bob@wyman.us>> wrote:

I have been experimenting with using Verifiable Credentials to tag source code with provenance information, as a way of improving the reliability of LLM-generated code. I would welcome the group’s guidance on some open technical questions.

Background

When an LLM generates or assembles code, the result falls into one of three provenance categories:

  1.  Verified implementation — the code is drawn from a trusted library and implements a known, citable specification (e.g., “this function implements RFC 7033 §9.3”). Strong claim; verifiable.

  2.  Pattern-derived synthesis — the code is not from a library, but was synthesized by explicitly applying a named design pattern or algorithm. For example: “implements the Observer pattern [Gamma et al., Design Patterns, 1994, pp. 293-303]” or “sliding- window rate limiter following Kallmes, Towsley & Cassandras, IEEE CDC 1989.” The reasoning is traceable to a citable source even though the code is synthesized. Medium claim; checkable.

  3.  WAG (Wild-Assed Guess) — generated from training data with no specific basis. Weak claim; honest, but not auditable.

I would like to embed a VC in the source file that records which category applies and, for categories 1 and 2, which specific sources were the basis. The goals are several: six months later a reviewer can answer “where did this come from?” without reconstructing the original context; WAGs are flagged for mandatory human review before any code review is considered complete; pattern-derived code is flagged for review as well, though less urgently; and shipping code contains no WAGs — the tag makes this a checkable, enforceable policy rather than an aspiration.

Proposed embedding format (Python)

The natural location for the VC in Python is the docstring, which is the canonical metadata location for a function, survives most code transformations that strip comments, and is where a human reviewer naturally looks for “what is this and why does it exist?” The VC appears in a structured Provenance section at the end of the docstring:

def rate_limit_by_ip(parent_values, params, get_state,

                     set_state, publish, service_emit=None):

    """

    Sliding-window rate limiter per client IP address.

    RFC 7033 Section 9.3: servers should rate-limit by IP

    to prevent abuse and email harvesting.


    Provenance:

        category: verified

        implements: RFC 7033 Section 9.3

        verified-by: did:key:z6Mk...

        vc: eyJhbGciOiJFZERTQSJ9...   (compact JWT)

        subject: ni:///sha-256;a3f8c2d1...?ct=python-ast-no-vc

    """

The ct=python-ast-no-vc content type signals that the subject hash was computed over the function’s normalized AST with the Provenance section stripped. A verifier applies the same stripping before recomputing the hash. This avoids the self-referential problem of hashing content that includes the hash.

The VC payload (decoded) for a verified case — the human-readable summary lines in the docstring are redundant with these fields, allowing a reader to see the category at a glance without decoding the JWT:

{

  "@context": ["https://www.w3.org/ns/credentials/v2"],

  "type": ["VerifiableCredential", "CodeProvenanceCredential"],

  "issuer": "did:key:z6Mk...",

  "validFrom": "2026-03-31T00:00:00Z",

  "credentialSubject": {

    "id": "ni:///sha-256;a3f8c2d1e4b7...?ct=python-ast-no-vc",

    "provenanceCategory": "verified",

    "implements": "https://www.rfc-editor.org/rfc/rfc7033#section-9.3",

    "verifiedBy": "did:key:z6Mk..."

  }

}

For a pattern-derived case (docstring summary would read “category: pattern-derived / derivedFrom: Observer pattern”):

{

  "credentialSubject": {

    "id": "ni:///sha-256;b7c9e1f3...?ct=python-ast-no-vc",

    "provenanceCategory": "pattern-derived",

    "derivedFrom": [{

      "type": "DesignPattern",

      "name": "Observer",

      "canonicalRef": "Gamma et al., Design Patterns, 1994, pp. 293-303"

    }],

    "synthesizedBy": "did:key:z6Mk...",

    "humanReviewed": false

  }

}

For a WAG (docstring summary would read “category: WAG” — immediately visible to any reviewer). Note the absence of any derivedFrom or implements claim; the VC is honest about what it cannot assert:

{

  "credentialSubject": {

    "id": "ni:///sha-256;c2d4f6a8...?ct=python-ast-no-vc",

    "provenanceCategory": "WAG",

    "synthesizedBy": "did:key:z6Mk...",

    "humanReviewed": false,

    "warning": "no traceable basis; review before trusting"

  }

}

Open questions

Q1: credentialSubject.id for a function-level code artifact

I am using the ni: Named Information scheme (RFC 6920) with a hash of the function’s normalized AST (Provenance section excluded). Is there established practice for content-addressed URIs as VC subject identifiers for non-person, non-document artifacts such as code? The VC Data Model is clear that subjects need not be people, but the examples are almost entirely person- or organization-centric.

Q2: Claim strength vocabulary for the verified / pattern-derived / WAG hierarchy

The three categories differ in verifiability: - “Verified” makes a verifiable claim: tests exist that verify correspondence with the cited specification. - “Pattern-derived” makes a traceable but not automatically verifiable claim: synthesis followed specific, citable sources. - “WAG” makes no claim beyond “an LLM generated this.”

Does existing VC vocabulary — perhaps the Confidence Method specification currently in development, or existing credential schema work — cover this kind of graduated evidence hierarchy? Or is a custom vocabulary extension the right path?

Q3: Embedding location

The docstring approach above is inline and durable but requires the stripping convention to avoid the self-reference problem. Including a human-readable summary of the key VC claims alongside the compact JWT (as shown) makes the provenance category immediately visible to code reviewers without requiring JWT decoding — the summary is for humans, the JWT is for machines, and a linting tool can verify that they agree. The alternative is a sidecar file or registry endpoint, where the code carries only a reference URI and the VC is stored and revoked externally. The supply chain security ecosystem (Sigstore, SLSA, in-toto) stores attestations externally referenced by artifact digest. Is there a reason to prefer inline embedding for source code specifically, or is external storage with a reference the better model?

Q4: Prior art

Is the group aware of existing work on using W3C VCs for sub-file, function-level provenance in source code? Supply chain security tools address file- and package-level signing; nothing I have found addresses individual functions within a file using the W3C VC format.

A note on the human coder objection

The obvious objection is that this tagging discipline would be cumbersome for human coders to produce. This is true, but the burden falls asymmetrically.

For reading, humans benefit directly. A reviewer opening an unfamiliar file can immediately see, for each function, whether its provenance is verified against a specification, derived from a named pattern, or a WAG. That distinction is currently invisible in code review. Making it visible is the primary purpose of the tagging system, and it serves human reviewers as much as automated tools. It also enables automated procedures that exploit the tags: dependency scanners, verification checkers, and audit tools can all operate on structured provenance claims in ways they cannot on informal comments.

For writing, LLMs do not find tagging cumbersome. An LLM that generated a function already holds its provenance in context; emitting a structured Provenance section alongside the code is essentially free. A human coder can also develop a lightweight review workflow: write the code, then ask an LLM to inspect it and produce the Provenance section. This is a tractable use of LLM capability — analyzing existing code against known sources rather than generating speculatively.

The tagging requirement also creates a useful feedback loop: an LLM that must produce a citable derivedFrom reference is implicitly constrained to derive from something citable. It cannot generate a WAG and truthfully tag it as pattern-derived. The discipline of tagging improves the quality of generation, not only the quality of documentation.

Why this matters

The EU AI Act and similar regulations are creating pressure to document the provenance of AI-generated artifacts including code. Current practice is either no documentation or informal comments (“// generated by ChatGPT”). Neither is auditable. A VC-based tag makes provenance machine-readable, cryptographically bound to a specific implementation, and revocable if the source is found to be incorrect.

The pattern-derived category is interesting. For verified implementations the VC makes a strong, checkable claim. For pattern-derived synthesis the VC makes a weaker but meaningful claim: “this is not a WAG — it was synthesized by following these specific, citable sources, and the reasoning can be checked even if automated verification is not yet possible.”

Thank you for any guidance or other comments.

bob wyman
Received on Wednesday, 1 April 2026 01:34:26 UTC