Re: For each bubble, a container for KR terms

Milton, Dave, all,

Following Milton’s point that “the core elements must be well defined” 
and that this is missing from the blue bubbles, we’ve just completed a 
small training run in K3D to construct such atomic elements explicitly, 
without tokenization.

Very briefly, for a restricted domain (ASCII + math glyphs), we define:

F = executable visual RPN programs (form space)
M = execution/semantic RPN programs (meaning space)
E = procedural embedding space ℝ^D
and build atomic units as:

A = (c, f, m, e) with c ∈ Σ, f ∈ F, m ∈ M, e ∈ E
In the current run we implemented:

148 atomic units total
72 dual‑program “stars” where each character has both:
a visual RPN program that actually renders the glyph on GPU, and
an execution RPN/bytecode program (e.g. e as Euler’s number, + as ADD, ^ 
as POW)
Cross‑modality here is compositional: we store visual and mathematical 
programs in the same atomic unit and retrieve that composite object, 
rather than projecting everything into a single token embedding space.

There is no natural‑language tokenization step in the LLM sense; form 
and meaning live in separate, well‑defined program domains (visual RPN 
and execution RPN), with natural language sitting on top rather than 
being the primary representation. Fusion happens via the 3D contract 
(the star), not by collapsing everything into one natural‑language 
vector space.

A short write‑up of this proof‑of‑concept, including the set‑theoretic 
definitions, metrics (148 units, 72 dual‑program, ~2 minutes training, 
~2.2KB per unit), and example stars for e, +, and ^, is here:

https://github.com/danielcamposramos/Knowledge3D/blob/main/TEMP/W3C_AIKR_ATOMIC_UNITS_PROOF_NOV19.md

For those who previously asked for state‑of‑the‑art context: this line 
of work is consistent with current neuro‑symbolic and KR literature, 
e.g. methodological frameworks for symbolic/NSI reasoning and 
verification, recent surveys of neuro‑symbolic knowledge integration, 
and the trustworthiness/terminology baselines in ISO/IEC 22989:2022, as 
well as recent Green AI work on efficiency and lifecycle impact. The 
proof‑of‑concept above is just one concrete instantiation of those ideas 
for a small visual/math domain of discourse.

Looking ahead, this atomic‑unit validation is just the first step. The 
same construction A = (c, f, m, e) scales naturally from ASCII+math to 
full Unicode: Phase 3 on our side is to extend Ω_implemented from 148 
units to the full character set *across multiple scripts (Latin, CJK, 
Arabic, Devanagari, indigenous scripts, etc.), with script‑specific 
visual RPN families but the same set‑theoretic pattern for atomic units*.

The goal is to *support true multi‑language KR at the character level*, 
including the “invisible giants” of low‑resource and indigenous 
languages that current tokenization‑based LLMs systematically 
underserve: *each writing system gets explicit, executable atoms for 
form and meaning*, rather than being squeezed through an English‑centric 
tokenizer.

To make it easier to explore these connections, I’ve also assembled a 
public NotebookLM workspace that aggregates the main public AI‑KR web 
sources (AI‑KR wiki and reports, related KR/NSI papers, ISO/IEC 22989 
material, Green AI work, StratML and the K3D repo link):

https://notebooklm.google.com/notebook/80d00386-4b7d-4893-ae84-1c5f90c223de

The notebook on the group work includes automatically generated mind 
maps, quizzes, a video overview, an audio/podcast‑style overview, 
summary reports, and a central chat window where you can discuss the 
collected sources with a Gemini model. It’s intended purely as a shared 
research aid, not an official document. If Paola, Carl, or any other CG 
participant would like editor access to extend or correct it (e.g., by 
adding more vocab drafts or references), I’m happy to add you.

This is still very early and deliberately narrow in scope (one small 
domain of discourse), but I hope it’s a useful concrete example in the 
space you’re both describing:

Milton’s requirement for constructible atomic elements and domains of 
discourse;
Dave’s emphasis on structured but not purely formal KR, where plausible 
reasoning layers (PKN‑style) can sit on top of explicit, 
machine‑readable foundations.
If anyone is interested in the implementation details, I’m happy to take 
that to a separate thread or offline so we don’t overload this one.

Best regards,
Daniel

Received on Wednesday, 19 November 2025 17:23:21 UTC