Re: Using VCs for source code provenance: embedding, subject identification, and claim strength.

Bob,

I think we motley agree and I will go back to lurking after one more remark:

Imagine you are Huawei, desperate to sell telecom and data center hardware
to Europe and the US. How would verification help? The Chinese have already
figured this out.
https://www.uscc.gov/research/two-loops-how-chinas-open-ai-strategy-reinforces-its-industrial-dominance

Adrian

On Tue, Mar 31, 2026 at 9:29 PM Bob Wyman <bob@wyman.us> wrote:

> Adrian,
>
> Before addressing your points, I should note that the original message
> asked a specific technical question: *if* we tag source code with
> provenance, what is the right mechanism to use? Whether such a library is
> economically viable long-term is a separate and interesting debate, but it
> doesn't change the technical question. Even in a world where LLMs are
> dominant and open source is universal, the question of how to express and
> verify the provenance of generated code at the function level remains open.
> The VC embedding questions I raised stand regardless of how that broader
> debate resolves.
>
> On your specific points:
>
> These are fair challenges. Let me address each briefly.
>
> On reasoning beyond training data: agreed, and the point strengthens
> rather than weakens the case for a verified library. A more capable
> reasoner working from verified, RFC-traceable components produces better
> results than one working from an unverified aggregate. The library is not a
> constraint on reasoning — it is a grounded starting point that reasoning
> can extend.
>
> On verification cost: the cost is front-loaded and then amortized across
> every use. Verifying a rate-limiting component against RFC 7033 §9.3 once
> costs effort. Using that verified component across thousands of projects
> costs nothing additional. More importantly, the library itself is built
> using LLM-assisted processes — LLMs read RFCs, extract requirements, and
> generate candidate components that humans verify. The library grows at LLM
> speed, not human speed.
>
> On the decline of proprietary code: even in an all-open-source world the
> problem doesn't disappear. It shifts to "which open source component
> correctly implements this requirement, and how do I know?" A verified
> library answers exactly that question regardless of whether the code is
> proprietary or open.
>
> The deeper point may be this: you are describing a world where LLMs get
> better faster than any human-driven standardization process. I agree. The
> verified component library is not a human-driven standardization process
> competing with LLMs on speed — it is an LLM-assisted knowledge base that
> gets better as LLMs get better, and that makes LLMs better in return.
>
> One small piece of supporting evidence: I have discussed this process in
> detail with Claude, Gemini, and ChatGPT. Each independently concluded that
> being directed to draw first from a verified component library, and to
> justify any synthesis against a citable source, would improve the quality
> of their outputs. The models themselves may be the most credible witnesses
> to their own improvements.
>
> bob wyman
>
> On Tue, Mar 31, 2026 at 9:16 PM Adrian Gropper <agropper@healthurl.com>
> wrote:
>
>> Modern LLMs increasingly depend on reasoning rather than being limited to
>> their training data. In medicine and probably most other domains, research
>> consistently shows that more generic training and knowledge gives better
>> results than more fine-tuning for any particular domain.
>>
>> Also, verification requires time, money, and structure, especially if
>> humans are still involved. The generic LLMs are changing much faster than
>> any current standardization or verification process I can think of (but I
>> am not an expert in verification).
>>
>> I see a declining future for any proprietary code. It's just too easy for
>> LLMs to create hidden issues (bias, back doors) in proprietary code.
>>
>> Simply put, verified component libraries are nice-to-haves but I can't
>> see them being economically viable for much longer.
>>
>> Adrian
>>
>> On Tue, Mar 31, 2026 at 9:00 PM Bob Wyman <bob@wyman.us> wrote:
>>
>>> Adrian,
>>> You're right that LLMs are increasingly effective at verifying open
>>> source code, and that "trust to verify" changes the economics of review.
>>> But LLM verification still requires a standard to verify against. The
>>> approach I'm working with builds a curated library of verified
>>> computational components — each one traceable to a specific RFC section or
>>> named design pattern, carrying compliance tests, a proof of correctness,
>>> and a documented list of known limitations. An LLM directed to draw first
>>> from this library, and to fall back on its own synthesis only when no
>>> verified component exists, works within a constrained and grounded search
>>> space for common cases rather than across the full range of its training
>>> data. Without such discipline, LLM verification is circular: the model
>>> checks code against what it has seen before, which includes both good and
>>> bad implementations with no way to distinguish them. And if LLMs are
>>> already better than human reviewers without this discipline, they will be
>>> better still with it — the library gives them a verified knowledge base to
>>> work from rather than a statistical aggregate.
>>>
>>> The provenance tagging in the original message is a natural extension of
>>> that library discipline rather than a separate mechanism. An LLM directed
>>> to produce a citable derivedFrom reference is constrained to derive from
>>> something real and checkable. WAG-tagged code is not just flagged for
>>> review — it signals that the generation process lacked a verified basis,
>>> which is actionable for the next iteration.
>>>
>>> The library approach also applies beyond open source. Proprietary code,
>>> domain-specific logic, and novel protocol implementations cannot be
>>> verified by searching the web for real-world experience. A verified
>>> component library is the mechanism for making that knowledge reliably
>>> available to LLMs.
>>>
>>> bob wyman
>>>
>>> On Tue, Mar 31, 2026 at 8:41 PM Adrian Gropper <agropper@healthurl.com>
>>> wrote:
>>>
>>>> I think this is rapidly becoming irrelevant. When code is open source,
>>>> dollar for dollar, coding LLMs are already better than human reviewers and
>>>> code provenance becomes less important. At least for open source code, we
>>>> are moving from trust to verify.
>>>>
>>>> Coding LLMs also benefit from access to the context in which the code
>>>> will be used and the ability to search the web for real-world
>>>> experience with the code as well as improvements and modifications
>>>> applicable to open source code to further reduce the attack surface.
>>>>
>>>> Adrian
>>>>
>>>> On Tue, Mar 31, 2026 at 7:44 PM Bob Wyman <bob@wyman.us> wrote:
>>>>
>>>>> I have been experimenting with using Verifiable Credentials to tag
>>>>> source code with provenance information, as a way of improving the
>>>>> reliability of LLM-generated code. I would welcome the group’s guidance on
>>>>> some open technical questions.
>>>>>
>>>>> Background
>>>>>
>>>>> When an LLM generates or assembles code, the result falls into one of
>>>>> three provenance categories:
>>>>>
>>>>>    1.
>>>>>
>>>>>    Verified implementation — the code is drawn from a trusted library
>>>>>    and implements a known, citable specification (e.g., “this function
>>>>>    implements RFC 7033 §9.3”). Strong claim; verifiable.
>>>>>    2.
>>>>>
>>>>>    Pattern-derived synthesis — the code is not from a library, but
>>>>>    was synthesized by explicitly applying a named design pattern or algorithm.
>>>>>    For example: “implements the Observer pattern [Gamma et al., Design
>>>>>    Patterns, 1994, pp. 293-303]” or “sliding- window rate limiter following
>>>>>    Kallmes, Towsley & Cassandras, IEEE CDC 1989.” The reasoning is traceable
>>>>>    to a citable source even though the code is synthesized. Medium claim;
>>>>>    checkable.
>>>>>    3.
>>>>>
>>>>>    WAG (Wild-Assed Guess) — generated from training data with no
>>>>>    specific basis. Weak claim; honest, but not auditable.
>>>>>
>>>>> I would like to embed a VC in the source file that records which
>>>>> category applies and, for categories 1 and 2, which specific sources were
>>>>> the basis. The goals are several: six months later a reviewer can answer
>>>>> “where did this come from?” without reconstructing the original context;
>>>>> WAGs are flagged for mandatory human review before any code review is
>>>>> considered complete; pattern-derived code is flagged for review as well,
>>>>> though less urgently; and shipping code contains no WAGs — the tag makes
>>>>> this a checkable, enforceable policy rather than an aspiration.
>>>>>
>>>>> Proposed embedding format (Python)
>>>>>
>>>>> The natural location for the VC in Python is the docstring, which is
>>>>> the canonical metadata location for a function, survives most code
>>>>> transformations that strip comments, and is where a human reviewer
>>>>> naturally looks for “what is this and why does it exist?” The VC appears in
>>>>> a structured Provenance section at the end of the docstring:
>>>>>
>>>>> def rate_limit_by_ip(parent_values, params, get_state,
>>>>>
>>>>>                      set_state, publish, service_emit=None):
>>>>>
>>>>>     """
>>>>>
>>>>>     Sliding-window rate limiter per client IP address.
>>>>>
>>>>>     RFC 7033 Section 9.3: servers should rate-limit by IP
>>>>>
>>>>>     to prevent abuse and email harvesting.
>>>>>
>>>>>
>>>>>     Provenance:
>>>>>
>>>>>         category: verified
>>>>>
>>>>>         implements: RFC 7033 Section 9.3
>>>>>
>>>>>         verified-by: did:key:z6Mk...
>>>>>
>>>>>         vc: eyJhbGciOiJFZERTQSJ9...   (compact JWT)
>>>>>
>>>>>         subject: ni:///sha-256;a3f8c2d1...?ct=python-ast-no-vc
>>>>>
>>>>>     """
>>>>>
>>>>> The ct=python-ast-no-vc content type signals that the subject hash was
>>>>> computed over the function’s normalized AST with the Provenance section
>>>>> stripped. A verifier applies the same stripping before recomputing the
>>>>> hash. This avoids the self-referential problem of hashing content that
>>>>> includes the hash.
>>>>>
>>>>> The VC payload (decoded) for a verified case — the human-readable
>>>>> summary lines in the docstring are redundant with these fields, allowing a
>>>>> reader to see the category at a glance without decoding the JWT:
>>>>>
>>>>> {
>>>>>
>>>>>   "@context": ["https://www.w3.org/ns/credentials/v2"],
>>>>>
>>>>>   "type": ["VerifiableCredential", "CodeProvenanceCredential"],
>>>>>
>>>>>   "issuer": "did:key:z6Mk...",
>>>>>
>>>>>   "validFrom": "2026-03-31T00:00:00Z",
>>>>>
>>>>>   "credentialSubject": {
>>>>>
>>>>>     "id": "ni:///sha-256;a3f8c2d1e4b7...?ct=python-ast-no-vc",
>>>>>
>>>>>     "provenanceCategory": "verified",
>>>>>
>>>>>     "implements": "https://www.rfc-editor.org/rfc/rfc7033#section-9.3
>>>>> ",
>>>>>
>>>>>     "verifiedBy": "did:key:z6Mk..."
>>>>>
>>>>>   }
>>>>>
>>>>> }
>>>>>
>>>>> For a pattern-derived case (docstring summary would read “category:
>>>>> pattern-derived / derivedFrom: Observer pattern”):
>>>>>
>>>>> {
>>>>>
>>>>>   "credentialSubject": {
>>>>>
>>>>>     "id": "ni:///sha-256;b7c9e1f3...?ct=python-ast-no-vc",
>>>>>
>>>>>     "provenanceCategory": "pattern-derived",
>>>>>
>>>>>     "derivedFrom": [{
>>>>>
>>>>>       "type": "DesignPattern",
>>>>>
>>>>>       "name": "Observer",
>>>>>
>>>>>       "canonicalRef": "Gamma et al., Design Patterns, 1994, pp.
>>>>> 293-303"
>>>>>
>>>>>     }],
>>>>>
>>>>>     "synthesizedBy": "did:key:z6Mk...",
>>>>>
>>>>>     "humanReviewed": false
>>>>>
>>>>>   }
>>>>>
>>>>> }
>>>>>
>>>>> For a WAG (docstring summary would read “category: WAG” — immediately
>>>>> visible to any reviewer). Note the absence of any derivedFrom or implements
>>>>> claim; the VC is honest about what it cannot assert:
>>>>>
>>>>> {
>>>>>
>>>>>   "credentialSubject": {
>>>>>
>>>>>     "id": "ni:///sha-256;c2d4f6a8...?ct=python-ast-no-vc",
>>>>>
>>>>>     "provenanceCategory": "WAG",
>>>>>
>>>>>     "synthesizedBy": "did:key:z6Mk...",
>>>>>
>>>>>     "humanReviewed": false,
>>>>>
>>>>>     "warning": "no traceable basis; review before trusting"
>>>>>
>>>>>   }
>>>>>
>>>>> }
>>>>>
>>>>> Open questions
>>>>>
>>>>> Q1: credentialSubject.id for a function-level code artifact
>>>>>
>>>>> I am using the ni: Named Information scheme (RFC 6920) with a hash of
>>>>> the function’s normalized AST (Provenance section excluded). Is there
>>>>> established practice for content-addressed URIs as VC subject identifiers
>>>>> for non-person, non-document artifacts such as code? The VC Data Model is
>>>>> clear that subjects need not be people, but the examples are almost
>>>>> entirely person- or organization-centric.
>>>>>
>>>>> Q2: Claim strength vocabulary for the verified / pattern-derived / WAG
>>>>> hierarchy
>>>>>
>>>>> The three categories differ in verifiability: - “Verified” makes a
>>>>> verifiable claim: tests exist that verify correspondence with the cited
>>>>> specification. - “Pattern-derived” makes a traceable but not automatically
>>>>> verifiable claim: synthesis followed specific, citable sources. - “WAG”
>>>>> makes no claim beyond “an LLM generated this.”
>>>>>
>>>>> Does existing VC vocabulary — perhaps the Confidence Method
>>>>> specification currently in development, or existing credential schema work
>>>>> — cover this kind of graduated evidence hierarchy? Or is a custom
>>>>> vocabulary extension the right path?
>>>>>
>>>>> Q3: Embedding location
>>>>>
>>>>> The docstring approach above is inline and durable but requires the
>>>>> stripping convention to avoid the self-reference problem. Including a
>>>>> human-readable summary of the key VC claims alongside the compact JWT (as
>>>>> shown) makes the provenance category immediately visible to code reviewers
>>>>> without requiring JWT decoding — the summary is for humans, the JWT is for
>>>>> machines, and a linting tool can verify that they agree. The alternative is
>>>>> a sidecar file or registry endpoint, where the code carries only a
>>>>> reference URI and the VC is stored and revoked externally. The supply chain
>>>>> security ecosystem (Sigstore, SLSA, in-toto) stores attestations externally
>>>>> referenced by artifact digest. Is there a reason to prefer inline embedding
>>>>> for source code specifically, or is external storage with a reference the
>>>>> better model?
>>>>>
>>>>> Q4: Prior art
>>>>>
>>>>> Is the group aware of existing work on using W3C VCs for sub-file,
>>>>> function-level provenance in source code? Supply chain security tools
>>>>> address file- and package-level signing; nothing I have found addresses
>>>>> individual functions within a file using the W3C VC format.
>>>>>
>>>>> A note on the human coder objection
>>>>>
>>>>> The obvious objection is that this tagging discipline would be
>>>>> cumbersome for human coders to produce. This is true, but the burden falls
>>>>> asymmetrically.
>>>>>
>>>>> For reading, humans benefit directly. A reviewer opening an unfamiliar
>>>>> file can immediately see, for each function, whether its provenance is
>>>>> verified against a specification, derived from a named pattern, or a WAG.
>>>>> That distinction is currently invisible in code review. Making it visible
>>>>> is the primary purpose of the tagging system, and it serves human reviewers
>>>>> as much as automated tools. It also enables automated procedures that
>>>>> exploit the tags: dependency scanners, verification checkers, and audit
>>>>> tools can all operate on structured provenance claims in ways they cannot
>>>>> on informal comments.
>>>>>
>>>>> For writing, LLMs do not find tagging cumbersome. An LLM that
>>>>> generated a function already holds its provenance in context; emitting a
>>>>> structured Provenance section alongside the code is essentially free. A
>>>>> human coder can also develop a lightweight review workflow: write the code,
>>>>> then ask an LLM to inspect it and produce the Provenance section. This is a
>>>>> tractable use of LLM capability — analyzing existing code against known
>>>>> sources rather than generating speculatively.
>>>>>
>>>>> The tagging requirement also creates a useful feedback loop: an LLM
>>>>> that must produce a citable derivedFrom reference is implicitly constrained
>>>>> to derive from something citable. It cannot generate a WAG and truthfully
>>>>> tag it as pattern-derived. The discipline of tagging improves the quality
>>>>> of generation, not only the quality of documentation.
>>>>>
>>>>> Why this matters
>>>>>
>>>>> The EU AI Act and similar regulations are creating pressure to
>>>>> document the provenance of AI-generated artifacts including code. Current
>>>>> practice is either no documentation or informal comments (“// generated by
>>>>> ChatGPT”). Neither is auditable. A VC-based tag makes provenance
>>>>> machine-readable, cryptographically bound to a specific implementation, and
>>>>> revocable if the source is found to be incorrect.
>>>>>
>>>>> The pattern-derived category is interesting. For verified
>>>>> implementations the VC makes a strong, checkable claim. For pattern-derived
>>>>> synthesis the VC makes a weaker but meaningful claim: “this is not a WAG —
>>>>> it was synthesized by following these specific, citable sources, and the
>>>>> reasoning can be checked even if automated verification is not yet
>>>>> possible.”
>>>>>
>>>>> Thank you for any guidance or other comments.
>>>>>
>>>>> bob wyman
>>>>>
>>>>>
>>>>>

Received on Wednesday, 1 April 2026 02:09:05 UTC