Using VCs for source code provenance: embedding, subject identification, and claim strength. from Bob Wyman on 2026-03-31 (public-credentials@w3.org from March 2026)

From: Bob Wyman <bob@wyman.us>
Date: Tue, 31 Mar 2026 19:41:46 -0400
To: "W3C Credentials CG (Public List)" <public-credentials@w3.org>
Message-ID: <CAA1s49WJshttjXxqaLpOxfoyjLh6J73vuz=owqzMFbRupb96Eg@mail.gmail.com>
I have been experimenting with using Verifiable Credentials to tag source
code with provenance information, as a way of improving the reliability of
LLM-generated code. I would welcome the group’s guidance on some open
technical questions.

Background

When an LLM generates or assembles code, the result falls into one of three
provenance categories:

   1.

   Verified implementation — the code is drawn from a trusted library and
   implements a known, citable specification (e.g., “this function implements
   RFC 7033 §9.3”). Strong claim; verifiable.
   2.

   Pattern-derived synthesis — the code is not from a library, but was
   synthesized by explicitly applying a named design pattern or algorithm. For
   example: “implements the Observer pattern [Gamma et al., Design Patterns,
   1994, pp. 293-303]” or “sliding- window rate limiter following Kallmes,
   Towsley & Cassandras, IEEE CDC 1989.” The reasoning is traceable to a
   citable source even though the code is synthesized. Medium claim; checkable.
   3.

   WAG (Wild-Assed Guess) — generated from training data with no specific
   basis. Weak claim; honest, but not auditable.

I would like to embed a VC in the source file that records which category
applies and, for categories 1 and 2, which specific sources were the basis.
The goals are several: six months later a reviewer can answer “where did
this come from?” without reconstructing the original context; WAGs are
flagged for mandatory human review before any code review is considered
complete; pattern-derived code is flagged for review as well, though less
urgently; and shipping code contains no WAGs — the tag makes this a
checkable, enforceable policy rather than an aspiration.

Proposed embedding format (Python)

The natural location for the VC in Python is the docstring, which is the
canonical metadata location for a function, survives most code
transformations that strip comments, and is where a human reviewer
naturally looks for “what is this and why does it exist?” The VC appears in
a structured Provenance section at the end of the docstring:

def rate_limit_by_ip(parent_values, params, get_state,

                     set_state, publish, service_emit=None):

    """

    Sliding-window rate limiter per client IP address.

    RFC 7033 Section 9.3: servers should rate-limit by IP

    to prevent abuse and email harvesting.


    Provenance:

        category: verified

        implements: RFC 7033 Section 9.3

        verified-by: did:key:z6Mk...

        vc: eyJhbGciOiJFZERTQSJ9...   (compact JWT)

        subject: ni:///sha-256;a3f8c2d1...?ct=python-ast-no-vc

    """

The ct=python-ast-no-vc content type signals that the subject hash was
computed over the function’s normalized AST with the Provenance section
stripped. A verifier applies the same stripping before recomputing the
hash. This avoids the self-referential problem of hashing content that
includes the hash.

The VC payload (decoded) for a verified case — the human-readable summary
lines in the docstring are redundant with these fields, allowing a reader
to see the category at a glance without decoding the JWT:

{

  "@context": ["https://www.w3.org/ns/credentials/v2"],

  "type": ["VerifiableCredential", "CodeProvenanceCredential"],

  "issuer": "did:key:z6Mk...",

  "validFrom": "2026-03-31T00:00:00Z",

  "credentialSubject": {

    "id": "ni:///sha-256;a3f8c2d1e4b7...?ct=python-ast-no-vc",

    "provenanceCategory": "verified",

    "implements": "https://www.rfc-editor.org/rfc/rfc7033#section-9.3",

    "verifiedBy": "did:key:z6Mk..."

  }

}

For a pattern-derived case (docstring summary would read “category:
pattern-derived / derivedFrom: Observer pattern”):

{

  "credentialSubject": {

    "id": "ni:///sha-256;b7c9e1f3...?ct=python-ast-no-vc",

    "provenanceCategory": "pattern-derived",

    "derivedFrom": [{

      "type": "DesignPattern",

      "name": "Observer",

      "canonicalRef": "Gamma et al., Design Patterns, 1994, pp. 293-303"

    }],

    "synthesizedBy": "did:key:z6Mk...",

    "humanReviewed": false

  }

}

For a WAG (docstring summary would read “category: WAG” — immediately
visible to any reviewer). Note the absence of any derivedFrom or implements
claim; the VC is honest about what it cannot assert:

{

  "credentialSubject": {

    "id": "ni:///sha-256;c2d4f6a8...?ct=python-ast-no-vc",

    "provenanceCategory": "WAG",

    "synthesizedBy": "did:key:z6Mk...",

    "humanReviewed": false,

    "warning": "no traceable basis; review before trusting"

  }

}

Open questions

Q1: credentialSubject.id for a function-level code artifact

I am using the ni: Named Information scheme (RFC 6920) with a hash of the
function’s normalized AST (Provenance section excluded). Is there
established practice for content-addressed URIs as VC subject identifiers
for non-person, non-document artifacts such as code? The VC Data Model is
clear that subjects need not be people, but the examples are almost
entirely person- or organization-centric.

Q2: Claim strength vocabulary for the verified / pattern-derived / WAG
hierarchy

The three categories differ in verifiability: - “Verified” makes a
verifiable claim: tests exist that verify correspondence with the cited
specification. - “Pattern-derived” makes a traceable but not automatically
verifiable claim: synthesis followed specific, citable sources. - “WAG”
makes no claim beyond “an LLM generated this.”

Does existing VC vocabulary — perhaps the Confidence Method specification
currently in development, or existing credential schema work — cover this
kind of graduated evidence hierarchy? Or is a custom vocabulary extension
the right path?

Q3: Embedding location

The docstring approach above is inline and durable but requires the
stripping convention to avoid the self-reference problem. Including a
human-readable summary of the key VC claims alongside the compact JWT (as
shown) makes the provenance category immediately visible to code reviewers
without requiring JWT decoding — the summary is for humans, the JWT is for
machines, and a linting tool can verify that they agree. The alternative is
a sidecar file or registry endpoint, where the code carries only a
reference URI and the VC is stored and revoked externally. The supply chain
security ecosystem (Sigstore, SLSA, in-toto) stores attestations externally
referenced by artifact digest. Is there a reason to prefer inline embedding
for source code specifically, or is external storage with a reference the
better model?

Q4: Prior art

Is the group aware of existing work on using W3C VCs for sub-file,
function-level provenance in source code? Supply chain security tools
address file- and package-level signing; nothing I have found addresses
individual functions within a file using the W3C VC format.

A note on the human coder objection

The obvious objection is that this tagging discipline would be cumbersome
for human coders to produce. This is true, but the burden falls
asymmetrically.

For reading, humans benefit directly. A reviewer opening an unfamiliar file
can immediately see, for each function, whether its provenance is verified
against a specification, derived from a named pattern, or a WAG. That
distinction is currently invisible in code review. Making it visible is the
primary purpose of the tagging system, and it serves human reviewers as
much as automated tools. It also enables automated procedures that exploit
the tags: dependency scanners, verification checkers, and audit tools can
all operate on structured provenance claims in ways they cannot on informal
comments.

For writing, LLMs do not find tagging cumbersome. An LLM that generated a
function already holds its provenance in context; emitting a structured
Provenance section alongside the code is essentially free. A human coder
can also develop a lightweight review workflow: write the code, then ask an
LLM to inspect it and produce the Provenance section. This is a tractable
use of LLM capability — analyzing existing code against known sources
rather than generating speculatively.

The tagging requirement also creates a useful feedback loop: an LLM that
must produce a citable derivedFrom reference is implicitly constrained to
derive from something citable. It cannot generate a WAG and truthfully tag
it as pattern-derived. The discipline of tagging improves the quality of
generation, not only the quality of documentation.

Why this matters

The EU AI Act and similar regulations are creating pressure to document the
provenance of AI-generated artifacts including code. Current practice is
either no documentation or informal comments (“// generated by ChatGPT”).
Neither is auditable. A VC-based tag makes provenance machine-readable,
cryptographically bound to a specific implementation, and revocable if the
source is found to be incorrect.

The pattern-derived category is interesting. For verified implementations
the VC makes a strong, checkable claim. For pattern-derived synthesis the
VC makes a weaker but meaningful claim: “this is not a WAG — it was
synthesized by following these specific, citable sources, and the reasoning
can be checked even if automated verification is not yet possible.”

Thank you for any guidance or other comments.

bob wyman
Received on Tuesday, 31 March 2026 23:42:04 UTC