Re: Blockchain Commons Known Values Registry: Compact Integer Identifiers for Ontological Concepts

In theory, with 64 bits, one could define that the high-bit toggles between reserved/system and application-specific codepoint spaces. That is, one could partition the 2^64 possible values into two spaces, each space having 2^63 values. One could then register and assign codepoints for up to 2^63 system entries while providing software developers with the ability to utilize application-specific dictionaries (up to 2^63 entries) simultaneously.

Benefits of registering system codepoint entries will, I think, include accelerating knowledgebase initialization. Encoding and mapping of core system entries (e.g., RDF, RDFS, OWL, Schema.org, and more) could be performed once, a priori. I'm envisioning a set of community-standard, reusable, binary blobs, resources which could be downloaded and cached on clients before Web developers load them up and subsequently add application-specific knowledge-graph contents atop them. More concretely, one could enable software developers to be able to construct knowledgebase instances with one constructor parameter being an enumerated value, flags for indicating which community-standard bundles, which binary blob resources, to very efficiently load up during initialization.

In addition to storage- and transmission-related benefits, mapping URIs to 64-bit values brings computation-related benefits, including some GPGPU possibilities. Perhaps the techniques under discussion can enable exploration into "WASM x WebGPU" software libraries for knowledge-graph processing. Interesting topics!


Best regards,
Adam


________________________________
From: Filip Kolarik <filip26@gmail.com>
Sent: Tuesday, February 3, 2026 6:52 PM
To: Melvin Carvalho <melvincarvalho@gmail.com>
Cc: Christopher Allen <ChristopherA@lifewithalacrity.com>; Credentials Community Group <public-credentials@w3.org>; Wolf McNally <wolf@wolfmcnally.com>; Shannon Appelcline <shannon.appelcline@gmail.com>
Subject: Re: Blockchain Commons Known Values Registry: Compact Integer Identifiers for Ontological Concepts

Hi,
CBOR-LD [1] uses shared dictionaries to improve compression ratios. These dictionaries can be generated from JSON-LD contexts or provided externally, and this information is encoded in the final CBOR-LD output.

Why maintain a fixed registry for compact identifiers? Are these compact identifiers, which represent ontological terms, intended to be used standalone, or embedded within a larger representation? In the second case, a fixed registry may be unnecessary; it could be replaced with a dereferenceable "context" that maps terms to integers.

Best regards,
Filip
https://www.linkedin.com/in/filipkolarik/


[1] https://github.com/filip26/iridium-cbor-ld



On Wed, Feb 4, 2026 at 12:36 AM Melvin Carvalho <melvincarvalho@gmail.com<mailto:melvincarvalho@gmail.com>> wrote:


út 3. 2. 2026 v 23:51 odesílatel Christopher Allen <ChristopherA@lifewithalacrity.com<mailto:ChristopherA@lifewithalacrity.com>> napsal:
TL;DR: I'm seeking CCG community input on a compact identifier registry we've developed at Blockchain Commons. Our Known Values Registry (BCR-2023-002) maps ontological concepts — predicates, classes, properties — to 64-bit integers, providing a compact binary representation while preserving semantic meaning.

We've already mapped several vocabularies this community uses (RDF, RDFS, Dublin Core, FOAF, SKOS, Verifiable Credentials, Schema.org), and we're developing new schemas for areas like principal authority, signature context, and peer endorsements.

Three questions for the community:

- Are there other ontologies or vocabularies CCG uses that we should prioritize mapping?

- Would schemas for principal authority (who directed vs who performed), signature context (the capacity in which someone signs), and peer endorsements be useful for VC implementations?

- Is anyone working on similar compact-identifier approaches?

Here's the detail on what we've built:

The Known Values Registry

BCR-2023-002 defines a namespace of 64-bit unsigned integers representing ontological concepts — relationships, classes, properties, and enumerated values. Each integer maps to a canonical name and equivalent URIs.

    https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-2023-002-known-value.md


We needed compact binary representation and deterministic encoding, but the registry itself is independent of any particular encoding. While we serialize these as CBOR (#6.40000) for use with Gordian Envelope, the codepoint-to-concept mappings stand alone and can be used in any format or protocol.

For example, rdf:type (codepoint 1) encodes as:

    CBOR diagnostic: 40000(1)
    Bytes: d9 9c 40 01  (4 bytes)

Compare that to the 47-byte URI "http://www.w3.org/1999/02/22-rdf-syntax-ns#type". For documents with many predicates, this adds up.

What's Already Mapped

We've assigned codepoints for several vocabularies this community uses:

- RDF (2000-2049): 21 entries
- RDFS (2050-2099): 15 entries
- OWL 2 (2100-2199): 75 entries
- Dublin Core Elements (2200-2299): 15 entries
- Dublin Core Terms (2300-2499): 89 entries
- FOAF (2500-2699): 75 entries
- SKOS (2700-2799): 32 entries
- Solid (2800-2899): 33 entries
- W3C Verifiable Credentials (2900-2999): 28 entries
- GS1 Web Vocabulary (3000-3999): 609 entries
- Schema.org (10000-19999): 2450 entries

These are 1:1 mappings — the Known Value codepoints reference the canonical URIs from each ontology.

Emerging Schemas

We're also developing predicates for areas where we haven't found existing schemas to leverage. These are currently in community review (see the current PRs in the repository):

- Principal Authority — predicates for expressing who directed a work vs who performed it (e.g., human holds principalAuthority over AI-generated content)

- Signature Context — the capacity in which someone signs (e.g., CFO signs onBehalfOf their corporation, not personally)

- Fair Witness — neutral third-party observation attestations (e.g., notary attesting they observed a signature ceremony)

- Peer Endorsement — skill and collaboration endorsements distinct from formal credentials (e.g., colleague endorsing another's security expertise based on project work)

- CreativeWork Roles — contribution roles mapped to CRediT with ONIX, MARC (e.g., distinguishing Author from Editor from Reviewer on a collaborative work)

I'm planning to make these available as schemas useable with JSON-LD and other formats at https://assertions.info for those working outside CBOR/Envelope contexts, if the W3C CCG community finds them useful.

Community Registry

Codepoints 100,000+ are open for community registration via automated GitHub workflow — submit a JSON file, validation runs, and upon merge you have registered codepoints. No gatekeeping beyond schema conformance and uniqueness checks.

Resources

Full registry with JSON exports:

  https://github.com/BlockchainCommons/Research/tree/master/known-value-assignments


We also presented on Known Values at our January Gordian Community meeting:

Video:
    https://youtu.be/FiLNhx9BOuk?t=2658 (Known Value discussion starts at 44:18)

Transcript:
    https://developer.blockchaincommons.com/meetings/2026-01-gordian/transcript/#known-values-discussion


Mapping URIs to integers saves a few hundred bytes for typical documents, far less than general-purpose compression delivers for free, without requiring all implementations to remain synchronized against a centrally managed registry. When systems inevitably drift, integer codepoints fail silently (the same number meaning different things), whereas URIs fail loudly. The proposed ontology work on principal authority and peer endorsement may have merit, but bundling it with a bespoke compression mechanism couples two unrelated design decisions and makes both harder to evaluate on their own terms.



-- Christopher Allen
   Blockchain Commons

Received on Wednesday, 4 February 2026 03:30:16 UTC