Procedural Codecs: How PM-KR Achieves 70%+ Compression

Hi all,

Following up on our discussions about PM-KR's compression capabilities 
(70%+ reduction via symlink-style composition), I want to share the 
technical foundation: **procedural codecs**.

**Key insight:** Just as TrueType stores glyphs as procedural programs 
(Bézier curves + hinting instructions) instead of pixel bitmaps, PM-KR 
extends this pattern to **all knowledge domains**.

## What Are Procedural Codecs?

**Definition:** A codec (coder-decoder) that represents data as 
**executable procedures** rather than static values.

**Traditional codec:**
```
Input: Raw data → Encoder → Compressed static data → Decoder → 
Reconstructed data
```

**Procedural codec:**
```
Input: Raw data → Analyzer → Procedural program → Executor → 
Reconstructed data
```

**The difference:** Instead of storing compressed bits, we store 
**instructions to recreate the data**.

## Real-World Example: TrueType Fonts

**Traditional bitmap font (1980s):**
- Store every glyph at every size (12pt, 14pt, 16pt, 18pt...)
- Massive duplication: "A" at 12pt vs. "A" at 14pt are separate bitmaps
- Result: ~1 MB per font × 10 sizes = 10 MB

**TrueType procedural font (1991):**
- Store ONE program per glyph (Bézier control points + hinting)
- Execute program at ANY size (12pt, 14pt, 100pt, 1000pt...)
- Result: ~100 KB for entire font (100× compression!)

**This is procedural compression in action.**

## PM-KR's Procedural Codecs

K3D (PM-KR reference implementation) uses **3 main procedural codecs**:

### 1. PD04: Adaptive Procedural Embedding Codec

**What it does:** Compresses embeddings (vector representations) 
procedurally instead of storing dense float arrays.

**Traditional embedding:**
```python
# 2048D embedding = 8192 bytes (2048 floats × 4 bytes)
embedding = [0.234, -0.891, 0.456, ..., 0.123]  # 2048 floats
```

**PD04 procedural embedding:**
```json
{
   "@type": "ProceduralEmbedding",
   "program": [
     "BASIS_VECTOR", "0",    // Start with basis vector 0
     "SCALE", "0.234",       // Scale by first coefficient
     "BASIS_VECTOR", "1",    // Add basis vector 1
     "SCALE", "-0.891",      // Scale and accumulate
     "ADD",                  // Combine
     ...
   ],
   "dimensions": 2048,
   "compression": "ternary_quant"  // -1, 0, +1 (2 bits vs 32 bits)
}
```

**Compression achieved:**
- Ternary quantization: 2 bits/dimension vs. 32 bits (16× reduction)
- Sparse representation: Only non-zero coefficients stored
- Adaptive dimensions: 64D-2048D based on content complexity
- **Overall: 12-80× compression with 99.96-99.998% fidelity**

**Specification:** 
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/vocabulary/ADAPTIVE_PROCEDURAL_COMPRESSION_SPECIFICATION.md

### 2. VectorDotMap: Procedural Image Codec

**What it does:** Represents images as procedural programs instead of 
pixel grids.

**Traditional image (PNG/JPEG):**
```
1920×1080 image = 2,073,600 pixels × 3 bytes (RGB) = ~6 MB raw
PNG compression: ~500 KB - 2 MB (lossy/lossless)
```

**VectorDotMap procedural image:**
```json
{
   "@type": "ProceduralImage",
   "program": [
     "CIRCLE", "x=512", "y=384", "r=200", "color=#FF5733",
     "LINE", "x1=100", "y1=100", "x2=900", "y2=500", "width=5",
     "GRADIENT", "start=#000000", "end=#FFFFFF", "angle=45"
   ],
   "infinite_LOD": true
}
```

**Compression achieved:**
- **~2 KB for entire image** (3000× smaller than PNG!)
- **Infinite Level of Detail** (zoom to any resolution, no pixelation)
- **Semantic queries** ("find all red circles in image")

**Trade-off:** Not suitable for photorealistic images (best for 
diagrams, UI, technical drawings).

**Specification:** 
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/vocabulary/PROCEDURAL_VISUAL_SPECIFICATION.md 
(Section 7: VectorDotMap)

### 3. Procedural Fonts (Character Galaxy)

**What it does:** Extends TrueType's procedural glyph approach with 
semantic metadata.

**K3D Character Galaxy:**
```json
{
   "@type": "ProceduralCharacter",
   "char": "A",
   "glyph_program": [
     "MOVE_TO", "0,0",
     "LINE_TO", "50,100",
     "LINE_TO", "100,0",
     "MOVE_TO", "25,50",
     "LINE_TO", "75,50"
   ],
   "metadata": {
     "unicode": "U+0041",
     "language": "Latin",
     "pronunciation": "/eɪ/",
     "meaning": ["first letter", "indefinite article", "grade_A"]
   }
}
```

**Compression achieved:**
- 87.7 MB raw font data → 26.3 MB procedural (70% reduction)
- **Zero duplication:** "A" stored once, referenced infinitely
- **Multi-client rendering:** Same program → visual glyph (human) + 
semantic metadata (AI)

**Specification:** 
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/vocabulary/DUAL_CLIENT_CONTRACT_SPECIFICATION.md 
(Section 1.6: Procedural Fonts)

## Compression Patterns Across Codecs

All three codecs share **common procedural compression patterns**:

### 1. Canonical Forms + References (Symlink Pattern)

**Anti-pattern (static duplication):**
```json
{
   "word1": {"chars": ["A", "P", "P", "L", "E"]},  // Stores "APPLE"
   "word2": {"chars": ["A", "P", "P", "R", "O", "V", "E"]}  // 
Duplicates "A", "P", "P"
}
```

**Procedural pattern (canonical references):**
```json
{
   "canonical_chars": {
     "A": {"glyph": "...", "id": "char_A"},
     "P": {"glyph": "...", "id": "char_P"},
     "L": {"glyph": "...", "id": "char_L"},
     "E": {"glyph": "...", "id": "char_E"}
   },
   "word1": ["@char_A", "@char_P", "@char_P", "@char_L", "@char_E"],  // 
References!
   "word2": ["@char_A", "@char_P", "@char_P", "@char_R", "@char_O", 
"@char_V", "@char_E"]
}
```

**Result:** "A" stored once, referenced infinitely → ~70% reduction 
across Character Galaxy.

### 2. Adaptive Complexity (Matryoshka Pattern)

**Traditional:** One-size-fits-all (e.g., always 2048D embeddings, even 
for simple concepts).

**Procedural:** Match complexity to content.

| Content Type | Dimensions | Compression |
|--------------|------------|-------------|
| Simple shapes (circle, line) | 64D | 32× |
| Common words ("cat", "dog") | 256D | 8× |
| Complex math (integrals) | 1024D | 2× |
| Rare knowledge (quantum mechanics) | 2048D | 1× (full fidelity) |

**Result:** Average 12-30× compression across entire knowledge base.

### 3. Sparse Representation (RPN Stack Efficiency)

**Dense array (wasteful):**
```python
embedding = [0.234, 0, 0, -0.891, 0, 0, 0, 0.456, ...]  # 90% zeros!
```

**Sparse RPN program (efficient):**
```json
["BASIS_0", "SCALE", "0.234", "BASIS_3", "SCALE", "-0.891", "ADD", ...]
```

**Result:** Only non-zero coefficients stored → 10-50× reduction for 
sparse data.

## Connection to Manu Sporny's CBOR-LD Work

Manu mentioned **CBOR-LD** for JSON-LD compression (binary encoding, 
term dictionaries).

**Procedural codecs complement CBOR-LD perfectly:**

| Layer | Technology | What It Compresses |
|-------|------------|-------------------|
| **Serialization** | CBOR-LD | JSON syntax overhead (keys, brackets, 
whitespace) |
| **Representation** | PM-KR Procedural Codecs | Knowledge duplication 
(canonical forms, references) |
| **Execution** | RPN Programs | Computation (compact stack operations) |

**Combined pipeline:**
```
Raw Knowledge
   → Procedural Codecs (70% reduction)
   → JSON-LD (semantic structure)
   → CBOR-LD (binary encoding)
   → Final: 80-90% total compression!
```

**This is orthogonal compression:**
- CBOR-LD = syntax compression (how data is serialized)
- PM-KR = semantic compression (how knowledge is represented)

## Empirical Validation (K3D Snapshot)

| Codec | Input Size | Output Size | Compression | Fidelity |
|-------|-----------|-------------|-------------|----------|
| **PD04 (Embeddings)** | 100 MB (50K nodes × 2048D × 4 bytes) | 8.3 MB 
| 12× | 99.96% |
| **VectorDotMap (Images)** | 6 MB (PNG diagram) | 2 KB | 3000× | 
Lossless (vector) |
| **Procedural Fonts** | 87.7 MB (Character Galaxy raw) | 26.3 MB | 70% 
| Lossless (procedural) |
| **Overall (Galaxy Universe)** | 300 MB (raw knowledge base) | 90 MB 
(procedural) | 70% | 99.95% avg |

**Source:** 
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/W3C/PM_KR_EVIDENCE_VALIDATION_MATRIX.md

## Design Principles for Procedural Codecs

Based on K3D's experience, here are **5 principles** for designing 
procedural codecs:

### 1. **Canonical-First Design**
- Identify canonical forms (glyphs, shapes, symbols)
- Store canonical form once
- All instances reference canonical form
- **Never duplicate what can be referenced**

### 2. **Adaptive Complexity**
- Match representation complexity to content complexity
- Simple content → low-dimensional procedures
- Complex content → high-dimensional procedures
- **Don't waste bits on simple concepts**

### 3. **Sparse Representation**
- Most real-world data is sparse (90%+ zeros)
- Only encode non-zero elements
- Use stack-based execution (RPN) for efficiency
- **Zero is free (don't store it)**

### 4. **Dual-Client Rendering**
- Same procedural program serves multiple clients
- Human client: Visual rendering (glyphs, images)
- AI client: Semantic execution (embeddings, graphs)
- **One canonical source, multiple interpretations**

### 5. **Lossy-Optional**
- Lossless by default (exact reconstruction)
- Lossy available when acceptable (e.g., quantization)
- User/system controls fidelity vs. compression tradeoff
- **Transparency: Always know what you lost**

## Open Questions for Community

### 1. **Codec Standardization:**
Should PM-KR define standard codec interfaces? (e.g., 
`ProceduralImageCodec`, `ProceduralEmbeddingCodec`)

### 2. **Fidelity Thresholds:**
What's an acceptable fidelity floor for procedural compression? (K3D 
uses 99.95% avg, but is that universal?)

### 3. **Hybrid Approaches:**
When should we use procedural vs. static? (VectorDotMap is great for 
diagrams, terrible for photos)

### 4. **CBOR-LD Integration:**
Should PM-KR mandate CBOR-LD for serialization, or remain 
codec-agnostic? (Manu's input valuable here!)

### 5. **Inverse Procedures:**
Can we design codecs with **inverse programs** for perfect 
reconstruction? (e.g., `ENCODE` + `DECODE` as dual procedures)

## Next Steps

1. **Feedback Welcome:**
    - Are these codec patterns generalizable beyond K3D?
    - What other domains need procedural codecs? (audio? video? 3D models?)

2. **Proposed Specification:**
    - Should we draft `PM_KR_PROCEDURAL_CODEC_STANDARD.md`?
    - Define normative interfaces for codec design?

3. **Benchmark Suite:**
    - Compare procedural codecs vs. traditional codecs (PNG, JPEG, CBOR, 
Protobuf)
    - Metrics: compression ratio, fidelity, encode/decode latency

4. **Interoperability:**
    - How do procedural codecs integrate with existing formats?
    - Can we provide **bidirectional translators** (e.g., PNG ↔ 
VectorDotMap)?

## Conclusion: Procedural Codecs = PM-KR's Compression Engine

**TrueType proved it in 1991:** Procedural glyphs compress better than 
bitmaps.

**PM-KR extends it to all knowledge:** Why limit this pattern to fonts?

**The thesis:**
- Static data duplicates → bloat
- Procedural programs reference → compression
- Canonical forms + symlinks = 70%+ reduction

**This is not theoretical** — K3D validates it with:
- 12-80× embedding compression (PD04)
- 3000× image compression (VectorDotMap)
- 70% font compression (Character Galaxy)

**Question for the group:** Should procedural codecs be a **first-class 
concern** in PM-KR standardization?

If yes, let's design the normative interfaces and benchmark suite together.

Thoughts?

**Daniel Ramos**
Co-Chair, W3C PM-KR Community Group
AI Knowledge Architect

## References

**K3D Codec Specifications:**
- PD04 Codec: 
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/vocabulary/ADAPTIVE_PROCEDURAL_COMPRESSION_SPECIFICATION.md
- VectorDotMap: 
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/vocabulary/PROCEDURAL_VISUAL_SPECIFICATION.md 
(Section 7)
- Procedural Fonts: 
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/vocabulary/DUAL_CLIENT_CONTRACT_SPECIFICATION.md 
(Section 1.6)

**Empirical Evidence:**
- Compression validation: 
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/W3C/PM_KR_EVIDENCE_VALIDATION_MATRIX.md
- Strategic steering: 
https://github.com/danielcamposramos/Knowledge3D/blob/main/docs/W3C/PM_KR_STRATEGIC_STEERING.md

**Related Standards:**
- CBOR-LD: https://www.w3.org/TR/cbor-ld/
- TrueType Specification: 
https://developer.apple.com/fonts/TrueType-Reference-Manual/
- JSON-LD: https://www.w3.org/TR/json-ld/

**Historical Inspiration:**
- TrueType Fonts (Apple, 1991): Procedural glyph compression
- PostScript (Adobe, 1985): Page description as procedural programs
- LISP (McCarthy, 1960): Code as data, data as code (homoiconicity)

Received on Thursday, 26 February 2026 16:14:18 UTC