Re: Using VCs for source code provenance: embedding, subject identification, and claim strength. from Adrian Gropper on 2026-04-01 (public-credentials@w3.org from April 2026)

From: Adrian Gropper <agropper@healthurl.com>
Date: Tue, 31 Mar 2026 20:41:33 -0400
To: Bob Wyman <bob@wyman.us>
Cc: "W3C Credentials CG (Public List)" <public-credentials@w3.org>
Message-ID: <CANYRo8gmJd651p1DXD4T7dsdeozyKao519xuVGLWqh5T=U2H3w@mail.gmail.com>
I think this is rapidly becoming irrelevant. When code is open source,
dollar for dollar, coding LLMs are already better than human reviewers and
code provenance becomes less important. At least for open source code, we
are moving from trust to verify.

Coding LLMs also benefit from access to the context in which the code will
be used and the ability to search the web for real-world experience with
the code as well as improvements and modifications applicable to open
source code to further reduce the attack surface.

Adrian

On Tue, Mar 31, 2026 at 7:44 PM Bob Wyman <bob@wyman.us> wrote:

> I have been experimenting with using Verifiable Credentials to tag source
> code with provenance information, as a way of improving the reliability of
> LLM-generated code. I would welcome the group’s guidance on some open
> technical questions.
>
> Background
>
> When an LLM generates or assembles code, the result falls into one of
> three provenance categories:
>
>    1.
>
>    Verified implementation — the code is drawn from a trusted library and
>    implements a known, citable specification (e.g., “this function implements
>    RFC 7033 §9.3”). Strong claim; verifiable.
>    2.
>
>    Pattern-derived synthesis — the code is not from a library, but was
>    synthesized by explicitly applying a named design pattern or algorithm. For
>    example: “implements the Observer pattern [Gamma et al., Design Patterns,
>    1994, pp. 293-303]” or “sliding- window rate limiter following Kallmes,
>    Towsley & Cassandras, IEEE CDC 1989.” The reasoning is traceable to a
>    citable source even though the code is synthesized. Medium claim; checkable.
>    3.
>
>    WAG (Wild-Assed Guess) — generated from training data with no specific
>    basis. Weak claim; honest, but not auditable.
>
> I would like to embed a VC in the source file that records which category
> applies and, for categories 1 and 2, which specific sources were the basis.
> The goals are several: six months later a reviewer can answer “where did
> this come from?” without reconstructing the original context; WAGs are
> flagged for mandatory human review before any code review is considered
> complete; pattern-derived code is flagged for review as well, though less
> urgently; and shipping code contains no WAGs — the tag makes this a
> checkable, enforceable policy rather than an aspiration.
>
> Proposed embedding format (Python)
>
> The natural location for the VC in Python is the docstring, which is the
> canonical metadata location for a function, survives most code
> transformations that strip comments, and is where a human reviewer
> naturally looks for “what is this and why does it exist?” The VC appears in
> a structured Provenance section at the end of the docstring:
>
> def rate_limit_by_ip(parent_values, params, get_state,
>
>                      set_state, publish, service_emit=None):
>
>     """
>
>     Sliding-window rate limiter per client IP address.
>
>     RFC 7033 Section 9.3: servers should rate-limit by IP
>
>     to prevent abuse and email harvesting.
>
>
>     Provenance:
>
>         category: verified
>
>         implements: RFC 7033 Section 9.3
>
>         verified-by: did:key:z6Mk...
>
>         vc: eyJhbGciOiJFZERTQSJ9...   (compact JWT)
>
>         subject: ni:///sha-256;a3f8c2d1...?ct=python-ast-no-vc
>
>     """
>
> The ct=python-ast-no-vc content type signals that the subject hash was
> computed over the function’s normalized AST with the Provenance section
> stripped. A verifier applies the same stripping before recomputing the
> hash. This avoids the self-referential problem of hashing content that
> includes the hash.
>
> The VC payload (decoded) for a verified case — the human-readable summary
> lines in the docstring are redundant with these fields, allowing a reader
> to see the category at a glance without decoding the JWT:
>
> {
>
>   "@context": ["https://www.w3.org/ns/credentials/v2"],
>
>   "type": ["VerifiableCredential", "CodeProvenanceCredential"],
>
>   "issuer": "did:key:z6Mk...",
>
>   "validFrom": "2026-03-31T00:00:00Z",
>
>   "credentialSubject": {
>
>     "id": "ni:///sha-256;a3f8c2d1e4b7...?ct=python-ast-no-vc",
>
>     "provenanceCategory": "verified",
>
>     "implements": "https://www.rfc-editor.org/rfc/rfc7033#section-9.3",
>
>     "verifiedBy": "did:key:z6Mk..."
>
>   }
>
> }
>
> For a pattern-derived case (docstring summary would read “category:
> pattern-derived / derivedFrom: Observer pattern”):
>
> {
>
>   "credentialSubject": {
>
>     "id": "ni:///sha-256;b7c9e1f3...?ct=python-ast-no-vc",
>
>     "provenanceCategory": "pattern-derived",
>
>     "derivedFrom": [{
>
>       "type": "DesignPattern",
>
>       "name": "Observer",
>
>       "canonicalRef": "Gamma et al., Design Patterns, 1994, pp. 293-303"
>
>     }],
>
>     "synthesizedBy": "did:key:z6Mk...",
>
>     "humanReviewed": false
>
>   }
>
> }
>
> For a WAG (docstring summary would read “category: WAG” — immediately
> visible to any reviewer). Note the absence of any derivedFrom or implements
> claim; the VC is honest about what it cannot assert:
>
> {
>
>   "credentialSubject": {
>
>     "id": "ni:///sha-256;c2d4f6a8...?ct=python-ast-no-vc",
>
>     "provenanceCategory": "WAG",
>
>     "synthesizedBy": "did:key:z6Mk...",
>
>     "humanReviewed": false,
>
>     "warning": "no traceable basis; review before trusting"
>
>   }
>
> }
>
> Open questions
>
> Q1: credentialSubject.id for a function-level code artifact
>
> I am using the ni: Named Information scheme (RFC 6920) with a hash of the
> function’s normalized AST (Provenance section excluded). Is there
> established practice for content-addressed URIs as VC subject identifiers
> for non-person, non-document artifacts such as code? The VC Data Model is
> clear that subjects need not be people, but the examples are almost
> entirely person- or organization-centric.
>
> Q2: Claim strength vocabulary for the verified / pattern-derived / WAG
> hierarchy
>
> The three categories differ in verifiability: - “Verified” makes a
> verifiable claim: tests exist that verify correspondence with the cited
> specification. - “Pattern-derived” makes a traceable but not automatically
> verifiable claim: synthesis followed specific, citable sources. - “WAG”
> makes no claim beyond “an LLM generated this.”
>
> Does existing VC vocabulary — perhaps the Confidence Method specification
> currently in development, or existing credential schema work — cover this
> kind of graduated evidence hierarchy? Or is a custom vocabulary extension
> the right path?
>
> Q3: Embedding location
>
> The docstring approach above is inline and durable but requires the
> stripping convention to avoid the self-reference problem. Including a
> human-readable summary of the key VC claims alongside the compact JWT (as
> shown) makes the provenance category immediately visible to code reviewers
> without requiring JWT decoding — the summary is for humans, the JWT is for
> machines, and a linting tool can verify that they agree. The alternative is
> a sidecar file or registry endpoint, where the code carries only a
> reference URI and the VC is stored and revoked externally. The supply chain
> security ecosystem (Sigstore, SLSA, in-toto) stores attestations externally
> referenced by artifact digest. Is there a reason to prefer inline embedding
> for source code specifically, or is external storage with a reference the
> better model?
>
> Q4: Prior art
>
> Is the group aware of existing work on using W3C VCs for sub-file,
> function-level provenance in source code? Supply chain security tools
> address file- and package-level signing; nothing I have found addresses
> individual functions within a file using the W3C VC format.
>
> A note on the human coder objection
>
> The obvious objection is that this tagging discipline would be cumbersome
> for human coders to produce. This is true, but the burden falls
> asymmetrically.
>
> For reading, humans benefit directly. A reviewer opening an unfamiliar
> file can immediately see, for each function, whether its provenance is
> verified against a specification, derived from a named pattern, or a WAG.
> That distinction is currently invisible in code review. Making it visible
> is the primary purpose of the tagging system, and it serves human reviewers
> as much as automated tools. It also enables automated procedures that
> exploit the tags: dependency scanners, verification checkers, and audit
> tools can all operate on structured provenance claims in ways they cannot
> on informal comments.
>
> For writing, LLMs do not find tagging cumbersome. An LLM that generated a
> function already holds its provenance in context; emitting a structured
> Provenance section alongside the code is essentially free. A human coder
> can also develop a lightweight review workflow: write the code, then ask an
> LLM to inspect it and produce the Provenance section. This is a tractable
> use of LLM capability — analyzing existing code against known sources
> rather than generating speculatively.
>
> The tagging requirement also creates a useful feedback loop: an LLM that
> must produce a citable derivedFrom reference is implicitly constrained to
> derive from something citable. It cannot generate a WAG and truthfully tag
> it as pattern-derived. The discipline of tagging improves the quality of
> generation, not only the quality of documentation.
>
> Why this matters
>
> The EU AI Act and similar regulations are creating pressure to document
> the provenance of AI-generated artifacts including code. Current practice
> is either no documentation or informal comments (“// generated by
> ChatGPT”). Neither is auditable. A VC-based tag makes provenance
> machine-readable, cryptographically bound to a specific implementation, and
> revocable if the source is found to be incorrect.
>
> The pattern-derived category is interesting. For verified implementations
> the VC makes a strong, checkable claim. For pattern-derived synthesis the
> VC makes a weaker but meaningful claim: “this is not a WAG — it was
> synthesized by following these specific, citable sources, and the reasoning
> can be checked even if automated verification is not yet possible.”
>
> Thank you for any guidance or other comments.
>
> bob wyman
>
>
>
Received on Wednesday, 1 April 2026 00:41:51 UTC