- From: Daniel Ramos <capitain_jack@yahoo.com>
- Date: Thu, 26 Feb 2026 13:14:03 -0300
- To: public-pm-kr@w3.org
- Message-ID: <187ea682-cf9f-411d-90bd-882de10952ed@yahoo.com>
Hi all,
Following up on our discussions about PM-KR's compression capabilities
(70%+ reduction via symlink-style composition), I want to share the
technical foundation: **procedural codecs**.
**Key insight:** Just as TrueType stores glyphs as procedural programs
(Bézier curves + hinting instructions) instead of pixel bitmaps, PM-KR
extends this pattern to **all knowledge domains**.
## What Are Procedural Codecs?
**Definition:** A codec (coder-decoder) that represents data as
**executable procedures** rather than static values.
**Traditional codec:**
```
Input: Raw data → Encoder → Compressed static data → Decoder →
Reconstructed data
```
**Procedural codec:**
```
Input: Raw data → Analyzer → Procedural program → Executor →
Reconstructed data
```
**The difference:** Instead of storing compressed bits, we store
**instructions to recreate the data**.
## Real-World Example: TrueType Fonts
**Traditional bitmap font (1980s):**
- Store every glyph at every size (12pt, 14pt, 16pt, 18pt...)
- Massive duplication: "A" at 12pt vs. "A" at 14pt are separate bitmaps
- Result: ~1 MB per font × 10 sizes = 10 MB
**TrueType procedural font (1991):**
- Store ONE program per glyph (Bézier control points + hinting)
- Execute program at ANY size (12pt, 14pt, 100pt, 1000pt...)
- Result: ~100 KB for entire font (100× compression!)
**This is procedural compression in action.**
## PM-KR's Procedural Codecs
K3D (PM-KR reference implementation) uses **3 main procedural codecs**:
### 1. PD04: Adaptive Procedural Embedding Codec
**What it does:** Compresses embeddings (vector representations)
procedurally instead of storing dense float arrays.
**Traditional embedding:**
```python
# 2048D embedding = 8192 bytes (2048 floats × 4 bytes)
embedding = [0.234, -0.891, 0.456, ..., 0.123] # 2048 floats
```
**PD04 procedural embedding:**
```json
{
"@type": "ProceduralEmbedding",
"program": [
"BASIS_VECTOR", "0", // Start with basis vector 0
"SCALE", "0.234", // Scale by first coefficient
"BASIS_VECTOR", "1", // Add basis vector 1
"SCALE", "-0.891", // Scale and accumulate
"ADD", // Combine
...
],
"dimensions": 2048,
"compression": "ternary_quant" // -1, 0, +1 (2 bits vs 32 bits)
}
```
**Compression achieved:**
- Ternary quantization: 2 bits/dimension vs. 32 bits (16× reduction)
- Sparse representation: Only non-zero coefficients stored
- Adaptive dimensions: 64D-2048D based on content complexity
- **Overall: 12-80× compression with 99.96-99.998% fidelity**
**Specification:**
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/vocabulary/ADAPTIVE_PROCEDURAL_COMPRESSION_SPECIFICATION.md
### 2. VectorDotMap: Procedural Image Codec
**What it does:** Represents images as procedural programs instead of
pixel grids.
**Traditional image (PNG/JPEG):**
```
1920×1080 image = 2,073,600 pixels × 3 bytes (RGB) = ~6 MB raw
PNG compression: ~500 KB - 2 MB (lossy/lossless)
```
**VectorDotMap procedural image:**
```json
{
"@type": "ProceduralImage",
"program": [
"CIRCLE", "x=512", "y=384", "r=200", "color=#FF5733",
"LINE", "x1=100", "y1=100", "x2=900", "y2=500", "width=5",
"GRADIENT", "start=#000000", "end=#FFFFFF", "angle=45"
],
"infinite_LOD": true
}
```
**Compression achieved:**
- **~2 KB for entire image** (3000× smaller than PNG!)
- **Infinite Level of Detail** (zoom to any resolution, no pixelation)
- **Semantic queries** ("find all red circles in image")
**Trade-off:** Not suitable for photorealistic images (best for
diagrams, UI, technical drawings).
**Specification:**
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/vocabulary/PROCEDURAL_VISUAL_SPECIFICATION.md
(Section 7: VectorDotMap)
### 3. Procedural Fonts (Character Galaxy)
**What it does:** Extends TrueType's procedural glyph approach with
semantic metadata.
**K3D Character Galaxy:**
```json
{
"@type": "ProceduralCharacter",
"char": "A",
"glyph_program": [
"MOVE_TO", "0,0",
"LINE_TO", "50,100",
"LINE_TO", "100,0",
"MOVE_TO", "25,50",
"LINE_TO", "75,50"
],
"metadata": {
"unicode": "U+0041",
"language": "Latin",
"pronunciation": "/eɪ/",
"meaning": ["first letter", "indefinite article", "grade_A"]
}
}
```
**Compression achieved:**
- 87.7 MB raw font data → 26.3 MB procedural (70% reduction)
- **Zero duplication:** "A" stored once, referenced infinitely
- **Multi-client rendering:** Same program → visual glyph (human) +
semantic metadata (AI)
**Specification:**
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/vocabulary/DUAL_CLIENT_CONTRACT_SPECIFICATION.md
(Section 1.6: Procedural Fonts)
## Compression Patterns Across Codecs
All three codecs share **common procedural compression patterns**:
### 1. Canonical Forms + References (Symlink Pattern)
**Anti-pattern (static duplication):**
```json
{
"word1": {"chars": ["A", "P", "P", "L", "E"]}, // Stores "APPLE"
"word2": {"chars": ["A", "P", "P", "R", "O", "V", "E"]} //
Duplicates "A", "P", "P"
}
```
**Procedural pattern (canonical references):**
```json
{
"canonical_chars": {
"A": {"glyph": "...", "id": "char_A"},
"P": {"glyph": "...", "id": "char_P"},
"L": {"glyph": "...", "id": "char_L"},
"E": {"glyph": "...", "id": "char_E"}
},
"word1": ["@char_A", "@char_P", "@char_P", "@char_L", "@char_E"], //
References!
"word2": ["@char_A", "@char_P", "@char_P", "@char_R", "@char_O",
"@char_V", "@char_E"]
}
```
**Result:** "A" stored once, referenced infinitely → ~70% reduction
across Character Galaxy.
### 2. Adaptive Complexity (Matryoshka Pattern)
**Traditional:** One-size-fits-all (e.g., always 2048D embeddings, even
for simple concepts).
**Procedural:** Match complexity to content.
| Content Type | Dimensions | Compression |
|--------------|------------|-------------|
| Simple shapes (circle, line) | 64D | 32× |
| Common words ("cat", "dog") | 256D | 8× |
| Complex math (integrals) | 1024D | 2× |
| Rare knowledge (quantum mechanics) | 2048D | 1× (full fidelity) |
**Result:** Average 12-30× compression across entire knowledge base.
### 3. Sparse Representation (RPN Stack Efficiency)
**Dense array (wasteful):**
```python
embedding = [0.234, 0, 0, -0.891, 0, 0, 0, 0.456, ...] # 90% zeros!
```
**Sparse RPN program (efficient):**
```json
["BASIS_0", "SCALE", "0.234", "BASIS_3", "SCALE", "-0.891", "ADD", ...]
```
**Result:** Only non-zero coefficients stored → 10-50× reduction for
sparse data.
## Connection to Manu Sporny's CBOR-LD Work
Manu mentioned **CBOR-LD** for JSON-LD compression (binary encoding,
term dictionaries).
**Procedural codecs complement CBOR-LD perfectly:**
| Layer | Technology | What It Compresses |
|-------|------------|-------------------|
| **Serialization** | CBOR-LD | JSON syntax overhead (keys, brackets,
whitespace) |
| **Representation** | PM-KR Procedural Codecs | Knowledge duplication
(canonical forms, references) |
| **Execution** | RPN Programs | Computation (compact stack operations) |
**Combined pipeline:**
```
Raw Knowledge
→ Procedural Codecs (70% reduction)
→ JSON-LD (semantic structure)
→ CBOR-LD (binary encoding)
→ Final: 80-90% total compression!
```
**This is orthogonal compression:**
- CBOR-LD = syntax compression (how data is serialized)
- PM-KR = semantic compression (how knowledge is represented)
## Empirical Validation (K3D Snapshot)
| Codec | Input Size | Output Size | Compression | Fidelity |
|-------|-----------|-------------|-------------|----------|
| **PD04 (Embeddings)** | 100 MB (50K nodes × 2048D × 4 bytes) | 8.3 MB
| 12× | 99.96% |
| **VectorDotMap (Images)** | 6 MB (PNG diagram) | 2 KB | 3000× |
Lossless (vector) |
| **Procedural Fonts** | 87.7 MB (Character Galaxy raw) | 26.3 MB | 70%
| Lossless (procedural) |
| **Overall (Galaxy Universe)** | 300 MB (raw knowledge base) | 90 MB
(procedural) | 70% | 99.95% avg |
**Source:**
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/W3C/PM_KR_EVIDENCE_VALIDATION_MATRIX.md
## Design Principles for Procedural Codecs
Based on K3D's experience, here are **5 principles** for designing
procedural codecs:
### 1. **Canonical-First Design**
- Identify canonical forms (glyphs, shapes, symbols)
- Store canonical form once
- All instances reference canonical form
- **Never duplicate what can be referenced**
### 2. **Adaptive Complexity**
- Match representation complexity to content complexity
- Simple content → low-dimensional procedures
- Complex content → high-dimensional procedures
- **Don't waste bits on simple concepts**
### 3. **Sparse Representation**
- Most real-world data is sparse (90%+ zeros)
- Only encode non-zero elements
- Use stack-based execution (RPN) for efficiency
- **Zero is free (don't store it)**
### 4. **Dual-Client Rendering**
- Same procedural program serves multiple clients
- Human client: Visual rendering (glyphs, images)
- AI client: Semantic execution (embeddings, graphs)
- **One canonical source, multiple interpretations**
### 5. **Lossy-Optional**
- Lossless by default (exact reconstruction)
- Lossy available when acceptable (e.g., quantization)
- User/system controls fidelity vs. compression tradeoff
- **Transparency: Always know what you lost**
## Open Questions for Community
### 1. **Codec Standardization:**
Should PM-KR define standard codec interfaces? (e.g.,
`ProceduralImageCodec`, `ProceduralEmbeddingCodec`)
### 2. **Fidelity Thresholds:**
What's an acceptable fidelity floor for procedural compression? (K3D
uses 99.95% avg, but is that universal?)
### 3. **Hybrid Approaches:**
When should we use procedural vs. static? (VectorDotMap is great for
diagrams, terrible for photos)
### 4. **CBOR-LD Integration:**
Should PM-KR mandate CBOR-LD for serialization, or remain
codec-agnostic? (Manu's input valuable here!)
### 5. **Inverse Procedures:**
Can we design codecs with **inverse programs** for perfect
reconstruction? (e.g., `ENCODE` + `DECODE` as dual procedures)
## Next Steps
1. **Feedback Welcome:**
- Are these codec patterns generalizable beyond K3D?
- What other domains need procedural codecs? (audio? video? 3D models?)
2. **Proposed Specification:**
- Should we draft `PM_KR_PROCEDURAL_CODEC_STANDARD.md`?
- Define normative interfaces for codec design?
3. **Benchmark Suite:**
- Compare procedural codecs vs. traditional codecs (PNG, JPEG, CBOR,
Protobuf)
- Metrics: compression ratio, fidelity, encode/decode latency
4. **Interoperability:**
- How do procedural codecs integrate with existing formats?
- Can we provide **bidirectional translators** (e.g., PNG ↔
VectorDotMap)?
## Conclusion: Procedural Codecs = PM-KR's Compression Engine
**TrueType proved it in 1991:** Procedural glyphs compress better than
bitmaps.
**PM-KR extends it to all knowledge:** Why limit this pattern to fonts?
**The thesis:**
- Static data duplicates → bloat
- Procedural programs reference → compression
- Canonical forms + symlinks = 70%+ reduction
**This is not theoretical** — K3D validates it with:
- 12-80× embedding compression (PD04)
- 3000× image compression (VectorDotMap)
- 70% font compression (Character Galaxy)
**Question for the group:** Should procedural codecs be a **first-class
concern** in PM-KR standardization?
If yes, let's design the normative interfaces and benchmark suite together.
Thoughts?
**Daniel Ramos**
Co-Chair, W3C PM-KR Community Group
AI Knowledge Architect
## References
**K3D Codec Specifications:**
- PD04 Codec:
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/vocabulary/ADAPTIVE_PROCEDURAL_COMPRESSION_SPECIFICATION.md
- VectorDotMap:
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/vocabulary/PROCEDURAL_VISUAL_SPECIFICATION.md
(Section 7)
- Procedural Fonts:
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/vocabulary/DUAL_CLIENT_CONTRACT_SPECIFICATION.md
(Section 1.6)
**Empirical Evidence:**
- Compression validation:
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/W3C/PM_KR_EVIDENCE_VALIDATION_MATRIX.md
- Strategic steering:
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/W3C/PM_KR_STRATEGIC_STEERING.md
**Related Standards:**
- CBOR-LD: https://www.w3.org/TR/cbor-ld/
- TrueType Specification:
https://developer.apple.com/fonts/TrueType-Reference-Manual/
- JSON-LD: https://www.w3.org/TR/json-ld/
**Historical Inspiration:**
- TrueType Fonts (Apple, 1991): Procedural glyph compression
- PostScript (Adobe, 1985): Page description as procedural programs
- LISP (McCarthy, 1960): Code as data, data as code (homoiconicity)
Received on Thursday, 26 February 2026 16:14:18 UTC