Re: Kristals v3 spec (zip) — verifiable offline knowledge packs (Wikidata/RDF evolution) + v4 upgrade before integration from Kurt Cagle on 2026-01-13 (public-prov-comments@w3.org from January 2026)

From: Kurt Cagle <kurt.cagle@gmail.com>
Date: Tue, 13 Jan 2026 02:23:30 -0800
To: Réjean McCormick <boatbuilder610@gmail.com>
Cc: denny@wikimedia.org, richard.hotte@teluq.ca, rhotte@teluq.ca, ruben.verborgh@ugent.be, "lemire@gmail.com" <lemire@gmail.com>, ruben@inrupt.com, jfergar83@gmail.com, migumar2@infor.uva.es, cgutierr@dcc.uchile.cl, axel.polleres@siemens.com, mario.arias@deri.org, press@google.com, research-awards@google.com, velangve@microsoft.com, mwarner@microsoft.com, dekirk@microsoft.com, ukprteam@microsoft.com, w3t-pr@w3.org, public-json-ld-wg@w3.org, public-shacl@w3.org, public-rch-wg@w3.org, public-prov@w3.org, public-prov-comments@w3.org, support@ietf.org, media@ietf.org, press@wikimedia.org, research-wmf@lists.wikimedia.org, dev@parquet.apache.org, dev@arrow.apache.org
Message-ID: <CALm0LSGNg_4=KxwyJREjJ_kt_TzgqqscCtHkzMk+xFS5+5wRhQ@mail.gmail.com>

Looking at this now. Very intriguing idea that I hadn't heard about before,
but it makes a great deal of sense.

One quick note: You include a text file attachment in your email. Can you
also include the same file with a .md suffix?

*Kurt Cagle*
Editor in Chief
The Cagle Report
kurt.cagle@gmail.com
443-837-8725 <http://voice.google.com/calls?a=nc,%2B14438378725>


On Tue, Jan 13, 2026 at 12:13 AM Réjean McCormick <boatbuilder610@gmail.com>
wrote:

> Hello,
>
> I’m sending the *Kristals v3 spec bundle* (attached .zip). Kristals are a
> practical evolution of the *RDF/Wikibase/Wikidata* model into a modern
> distribution artifact: *verifiable, content-addressed knowledge packs*
> that run offline and stay reproducible across toolchains.
> What Kristals are
>
> A *Kristal* is a compiled knowledge unit (not a document, not free text).
> Each release produces:
>
>    -
>
>    *Kristal Exchange* — canonical, auditable “source of truth” for
>    validated statements
>    (Wikibase-shaped: QIDs/PIDs, typed values, qualifiers,
>    references/evidence)
>    -
>
>    *Kristal Runtime Pack* — derived, *offline-executable* indexed form
>    (predictable constrained queries; no SPARQL endpoint, no network
>    dependency, no LLM dependency)
>
> Pipeline boundary (strict):
> *Claim-IR (schema proposals with uncertainty + evidence) → resolution →
> deterministic validation (“no compile on fail”) → Exchange → Runtime Pack →
> deterministic rendering (no new facts).*
> Why this matters (AI + systems)
>
> Kristals are designed as a high-signal substrate for AI and data systems:
>
>    -
>
>    strict schema boundaries (no “free text becomes truth”)
>    -
>
>    evidence + uncertainty are first-class
>    -
>
>    stable IDs enable dataset versioning and reproducible experiments
>    -
>
>    offline packs enable low-latency retrieval and edge deployments
>
> What v3 locks in
>
>    -
>
>    *Normative canonical JSON*: JCS (RFC 8785) for portable content
>    addressing
>    -
>
>    *Fail-closed integrity*: declared hashes/signatures must verify or
>    consumers hard-fail
>    -
>
>    *Reproducible runtime packs*: portable, recorded policy selections
>    (ordering, row-groups, bitmap conventions, membership filters)
>    -
>
>    Optional profiles: *JSON-LD / RDF exports*, *RDFC integrity* (limits +
>    CI gating), *PROV-O/nanopubs*, *SHACL/ShEx*, *TPF-like pagination*
>
> Where this is integrated first (production)
>
> Kristals are being integrated across *Konnaxion × Orgo × Architect ×
> SenTient*:
>
>    -
>
>    *Orgo* orchestrates ingest → extract → resolve → validate → publish;
>    audits + distribution status
>    -
>
>    *SenTient* reconciles surfaces → ranked QIDs/PIDs; normalizes values;
>    preserves ambiguity
>    -
>
>    *Konnaxion* distributes Runtime Packs for offline search/navigation
>    and low-bandwidth UX
>    -
>
>    *Architect* renders deterministic multilingual text from validated
>    knowledge with full traceability
>
> What I want from you (v4 upgrade before I freeze production)
>
> I’m collecting technical review and support to upgrade this into *v4*
> before integration is frozen.
>
> I want direct judgment on:
>
>    1.
>
>    Is the *normative core* tight and unambiguous enough for
>    interoperability?
>    2.
>
>    Do the reproducibility rules avoid “rebuildable but incomparable”
>    packs?
>    3.
>
>    Is the offline query surface (TPF-like pagination profile) correctly
>    scoped and stable?
>    4.
>
>    Any non-obvious pitfalls in canonicalization/hashing/signing and
>    deterministic Parquet/index construction?
>
>
>
>
> ------------------------------
>
> Recipient-specific notes
> Daniel Lemire
>
> Your work on Roaring bitmaps and membership structures maps directly onto
> runtime-pack indexing. I want your judgment on portable defaults, which
> parameters must be recorded for reproducibility, and comparability across
> implementations.
> Ruben Verborgh
>
> I’m implementing a constrained offline query surface inspired by TPF
> (cursor paging, stable ordering, cache-friendly responses). I want your
> assessment of cursor semantics and the boundaries needed to keep it
> composable without drifting into SPARQL semantics.
> HDT authors (Fernández / Martínez-Prieto / Gutiérrez / Polleres / Arias)
>
> This targets the same objective as HDT—compact, distributable, queryable
> RDF-class knowledge—while adding a reproducible “pack + manifest” layer and
> offline execution constraints. I want your critique on where this should
> converge with HDT ideas vs where divergence is correct.
> W3C lists (JSON-LD / SHACL / RCH / PROV)
>
> I’m using these specs as explicit profiles (not core requirements). I want
> feedback on profile boundaries (what is covered/hashed), conformance
> language, and practical resource limits—especially for RDFC.
> Google / Microsoft research routing
>
> Route this to the right teams (knowledge graphs, data management,
> verifiable data, offline/edge search). I’m seeking technical review and v4
> upgrade input.
> Wikimedia Research / contacts
>
> This is an operational evolution of Wikidata-class knowledge distribution:
> verified, portable, offline packs. I want feedback on ecosystem fit,
> model/interop concerns, and what would make this useful at scale.
> Apache Parquet / Arrow lists
>
> I’m constraining to a small enumerated set of output policies for
> deterministic, comparable packs. I want concrete guidance on determinism
> pitfalls (ordering/stats/encoding) and what must be recorded to make
> rebuilds and verification rigorous.
>
> Truly,
> Réjean McCormick
>
> Socio-Technical Architect
> kOA
> okido.wiki
> https://github.com/Rejean-McCormick?tab=repositories
>

Received on Tuesday, 13 January 2026 10:24:05 UTC