- From: Réjean McCormick <boatbuilder610@gmail.com>
- Date: Wed, 7 Jan 2026 10:58:41 -0500
- To: denny@wikimedia.org, richard.hotte@teluq.ca, rhotte@teluq.ca, ruben.verborgh@ugent.be, "lemire@gmail.com" <lemire@gmail.com>, ruben@inrupt.com, jfergar83@gmail.com, migumar2@infor.uva.es, cgutierr@dcc.uchile.cl, axel.polleres@siemens.com, mario.arias@deri.org, press@google.com, research-awards@google.com, velangve@microsoft.com, mwarner@microsoft.com, dekirk@microsoft.com, ukprteam@microsoft.com, w3t-pr@w3.org, public-json-ld-wg@w3.org, public-shacl@w3.org, public-rch-wg@w3.org, public-prov@w3.org, public-prov-comments@w3.org, support@ietf.org, media@ietf.org, press@wikimedia.org, research-wmf@lists.wikimedia.org, dev@parquet.apache.org, dev@arrow.apache.org
- Message-ID: <CABW1y4ah4g1h+GmhFZbiBFdDG1+EyeUkFWWFDCos2g0JBNEGRw@mail.gmail.com>
Hello, I’m sending the *Kristals v3 spec bundle* (attached .zip). Kristals are a practical evolution of the *RDF/Wikibase/Wikidata* model into a modern distribution artifact: *verifiable, content-addressed knowledge packs* that run offline and stay reproducible across toolchains. What Kristals are A *Kristal* is a compiled knowledge unit (not a document, not free text). Each release produces: - *Kristal Exchange* — canonical, auditable “source of truth” for validated statements (Wikibase-shaped: QIDs/PIDs, typed values, qualifiers, references/evidence) - *Kristal Runtime Pack* — derived, *offline-executable* indexed form (predictable constrained queries; no SPARQL endpoint, no network dependency, no LLM dependency) Pipeline boundary (strict): *Claim-IR (schema proposals with uncertainty + evidence) → resolution → deterministic validation (“no compile on fail”) → Exchange → Runtime Pack → deterministic rendering (no new facts).* Why this matters (AI + systems) Kristals are designed as a high-signal substrate for AI and data systems: - strict schema boundaries (no “free text becomes truth”) - evidence + uncertainty are first-class - stable IDs enable dataset versioning and reproducible experiments - offline packs enable low-latency retrieval and edge deployments What v3 locks in - *Normative canonical JSON*: JCS (RFC 8785) for portable content addressing - *Fail-closed integrity*: declared hashes/signatures must verify or consumers hard-fail - *Reproducible runtime packs*: portable, recorded policy selections (ordering, row-groups, bitmap conventions, membership filters) - Optional profiles: *JSON-LD / RDF exports*, *RDFC integrity* (limits + CI gating), *PROV-O/nanopubs*, *SHACL/ShEx*, *TPF-like pagination* Where this is integrated first (production) Kristals are being integrated across *Konnaxion × Orgo × Architect × SenTient*: - *Orgo* orchestrates ingest → extract → resolve → validate → publish; audits + distribution status - *SenTient* reconciles surfaces → ranked QIDs/PIDs; normalizes values; preserves ambiguity - *Konnaxion* distributes Runtime Packs for offline search/navigation and low-bandwidth UX - *Architect* renders deterministic multilingual text from validated knowledge with full traceability What I want from you (v4 upgrade before I freeze production) I’m collecting technical review and support to upgrade this into *v4* before integration is frozen. I want direct judgment on: 1. Is the *normative core* tight and unambiguous enough for interoperability? 2. Do the reproducibility rules avoid “rebuildable but incomparable” packs? 3. Is the offline query surface (TPF-like pagination profile) correctly scoped and stable? 4. Any non-obvious pitfalls in canonicalization/hashing/signing and deterministic Parquet/index construction? ------------------------------ Recipient-specific notes Daniel Lemire Your work on Roaring bitmaps and membership structures maps directly onto runtime-pack indexing. I want your judgment on portable defaults, which parameters must be recorded for reproducibility, and comparability across implementations. Ruben Verborgh I’m implementing a constrained offline query surface inspired by TPF (cursor paging, stable ordering, cache-friendly responses). I want your assessment of cursor semantics and the boundaries needed to keep it composable without drifting into SPARQL semantics. HDT authors (Fernández / Martínez-Prieto / Gutiérrez / Polleres / Arias) This targets the same objective as HDT—compact, distributable, queryable RDF-class knowledge—while adding a reproducible “pack + manifest” layer and offline execution constraints. I want your critique on where this should converge with HDT ideas vs where divergence is correct. W3C lists (JSON-LD / SHACL / RCH / PROV) I’m using these specs as explicit profiles (not core requirements). I want feedback on profile boundaries (what is covered/hashed), conformance language, and practical resource limits—especially for RDFC. Google / Microsoft research routing Route this to the right teams (knowledge graphs, data management, verifiable data, offline/edge search). I’m seeking technical review and v4 upgrade input. Wikimedia Research / contacts This is an operational evolution of Wikidata-class knowledge distribution: verified, portable, offline packs. I want feedback on ecosystem fit, model/interop concerns, and what would make this useful at scale. Apache Parquet / Arrow lists I’m constraining to a small enumerated set of output policies for deterministic, comparable packs. I want concrete guidance on determinism pitfalls (ordering/stats/encoding) and what must be recorded to make rebuilds and verification rigorous. Truly, Réjean McCormick Socio-Technical Architect kOA okido.wiki https://github.com/Rejean-McCormick?tab=repositories
Attachments
- text/plain attachment: Docv3Kristals.txt
- application/x-zip-compressed attachment: kristal-docs-v3.zip
Received on Tuesday, 13 January 2026 11:23:42 UTC