- From: Kurt Cagle <kurt.cagle@gmail.com>
- Date: Tue, 13 Jan 2026 02:23:30 -0800
- To: Réjean McCormick <boatbuilder610@gmail.com>
- Cc: denny@wikimedia.org, richard.hotte@teluq.ca, rhotte@teluq.ca, ruben.verborgh@ugent.be, "lemire@gmail.com" <lemire@gmail.com>, ruben@inrupt.com, jfergar83@gmail.com, migumar2@infor.uva.es, cgutierr@dcc.uchile.cl, axel.polleres@siemens.com, mario.arias@deri.org, press@google.com, research-awards@google.com, velangve@microsoft.com, mwarner@microsoft.com, dekirk@microsoft.com, ukprteam@microsoft.com, w3t-pr@w3.org, public-json-ld-wg@w3.org, public-shacl@w3.org, public-rch-wg@w3.org, public-prov@w3.org, public-prov-comments@w3.org, support@ietf.org, media@ietf.org, press@wikimedia.org, research-wmf@lists.wikimedia.org, dev@parquet.apache.org, dev@arrow.apache.org
- Message-ID: <CALm0LSGNg_4=KxwyJREjJ_kt_TzgqqscCtHkzMk+xFS5+5wRhQ@mail.gmail.com>
Looking at this now. Very intriguing idea that I hadn't heard about before, but it makes a great deal of sense. One quick note: You include a text file attachment in your email. Can you also include the same file with a .md suffix? *Kurt Cagle* Editor in Chief The Cagle Report kurt.cagle@gmail.com 443-837-8725 <http://voice.google.com/calls?a=nc,%2B14438378725> On Tue, Jan 13, 2026 at 12:13 AM Réjean McCormick <boatbuilder610@gmail.com> wrote: > Hello, > > I’m sending the *Kristals v3 spec bundle* (attached .zip). Kristals are a > practical evolution of the *RDF/Wikibase/Wikidata* model into a modern > distribution artifact: *verifiable, content-addressed knowledge packs* > that run offline and stay reproducible across toolchains. > What Kristals are > > A *Kristal* is a compiled knowledge unit (not a document, not free text). > Each release produces: > > - > > *Kristal Exchange* — canonical, auditable “source of truth” for > validated statements > (Wikibase-shaped: QIDs/PIDs, typed values, qualifiers, > references/evidence) > - > > *Kristal Runtime Pack* — derived, *offline-executable* indexed form > (predictable constrained queries; no SPARQL endpoint, no network > dependency, no LLM dependency) > > Pipeline boundary (strict): > *Claim-IR (schema proposals with uncertainty + evidence) → resolution → > deterministic validation (“no compile on fail”) → Exchange → Runtime Pack → > deterministic rendering (no new facts).* > Why this matters (AI + systems) > > Kristals are designed as a high-signal substrate for AI and data systems: > > - > > strict schema boundaries (no “free text becomes truth”) > - > > evidence + uncertainty are first-class > - > > stable IDs enable dataset versioning and reproducible experiments > - > > offline packs enable low-latency retrieval and edge deployments > > What v3 locks in > > - > > *Normative canonical JSON*: JCS (RFC 8785) for portable content > addressing > - > > *Fail-closed integrity*: declared hashes/signatures must verify or > consumers hard-fail > - > > *Reproducible runtime packs*: portable, recorded policy selections > (ordering, row-groups, bitmap conventions, membership filters) > - > > Optional profiles: *JSON-LD / RDF exports*, *RDFC integrity* (limits + > CI gating), *PROV-O/nanopubs*, *SHACL/ShEx*, *TPF-like pagination* > > Where this is integrated first (production) > > Kristals are being integrated across *Konnaxion × Orgo × Architect × > SenTient*: > > - > > *Orgo* orchestrates ingest → extract → resolve → validate → publish; > audits + distribution status > - > > *SenTient* reconciles surfaces → ranked QIDs/PIDs; normalizes values; > preserves ambiguity > - > > *Konnaxion* distributes Runtime Packs for offline search/navigation > and low-bandwidth UX > - > > *Architect* renders deterministic multilingual text from validated > knowledge with full traceability > > What I want from you (v4 upgrade before I freeze production) > > I’m collecting technical review and support to upgrade this into *v4* > before integration is frozen. > > I want direct judgment on: > > 1. > > Is the *normative core* tight and unambiguous enough for > interoperability? > 2. > > Do the reproducibility rules avoid “rebuildable but incomparable” > packs? > 3. > > Is the offline query surface (TPF-like pagination profile) correctly > scoped and stable? > 4. > > Any non-obvious pitfalls in canonicalization/hashing/signing and > deterministic Parquet/index construction? > > > > > ------------------------------ > > Recipient-specific notes > Daniel Lemire > > Your work on Roaring bitmaps and membership structures maps directly onto > runtime-pack indexing. I want your judgment on portable defaults, which > parameters must be recorded for reproducibility, and comparability across > implementations. > Ruben Verborgh > > I’m implementing a constrained offline query surface inspired by TPF > (cursor paging, stable ordering, cache-friendly responses). I want your > assessment of cursor semantics and the boundaries needed to keep it > composable without drifting into SPARQL semantics. > HDT authors (Fernández / Martínez-Prieto / Gutiérrez / Polleres / Arias) > > This targets the same objective as HDT—compact, distributable, queryable > RDF-class knowledge—while adding a reproducible “pack + manifest” layer and > offline execution constraints. I want your critique on where this should > converge with HDT ideas vs where divergence is correct. > W3C lists (JSON-LD / SHACL / RCH / PROV) > > I’m using these specs as explicit profiles (not core requirements). I want > feedback on profile boundaries (what is covered/hashed), conformance > language, and practical resource limits—especially for RDFC. > Google / Microsoft research routing > > Route this to the right teams (knowledge graphs, data management, > verifiable data, offline/edge search). I’m seeking technical review and v4 > upgrade input. > Wikimedia Research / contacts > > This is an operational evolution of Wikidata-class knowledge distribution: > verified, portable, offline packs. I want feedback on ecosystem fit, > model/interop concerns, and what would make this useful at scale. > Apache Parquet / Arrow lists > > I’m constraining to a small enumerated set of output policies for > deterministic, comparable packs. I want concrete guidance on determinism > pitfalls (ordering/stats/encoding) and what must be recorded to make > rebuilds and verification rigorous. > > Truly, > Réjean McCormick > > Socio-Technical Architect > kOA > okido.wiki > https://github.com/Rejean-McCormick?tab=repositories >
Received on Tuesday, 13 January 2026 10:24:05 UTC