Re: [whatwg/encoding] Visualization tables has lack of descriptions (Issue #292) from Mingun on 2022-08-23 (public-webapps-github@w3.org from August 2022)

From: Mingun <notifications@github.com>
Date: Tue, 23 Aug 2022 09:04:04 -0700
To: whatwg/encoding <encoding@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/encoding/issues/292/1224280623@github.com>

I actually found answers for some my questions myself during understanding the `indexes.json`, but that would be nice to have them answered on visualization pages:

"BMP coverage" pages (such as https://encoding.spec.whatwg.org/big5-bmp.html) contains a table 256x256 with following information:
- External header row (`00 01 02 ...`) is a low byte of the code point `U+__XX` in decimal form
- Internal header row contains the same but in hexadecimal
- External header column (`00 01 02 ...`) is a high byte of the code point `U+XX__` in decimal form
- Internal header column contains the same but in hexadecimal
- Each cell contains
  - Code point value at the top in form `U+xxxx`
  - Glyph at the middle or glyph for `U+FFFD` (Replacement character) if that cell does not contain any mapped code point
  - position in the index (array index in JSON array of code points) which is called [_pointer_](https://encoding.spec.whatwg.org/#index-pointer) in the specification at the bottom

The table represents `256 x 256 = 0xFFFF` characters from Basic Multilingual Plane (who would doubt).

----

"Index" pages contains a tables with slightly different structure depending on the encoding with following information:
- Each cell contains
  - position in the index (array index in JSON array of code points) which is called [_pointer_](https://encoding.spec.whatwg.org/#index-pointer) in the specification
  - Glyph at the middle or glyph for `U+FFFD` (Replacement character) if that cell does not contain any mapped code point
  - Code point value at the bottom in form `U+xxxx` if cell represents a mapped value

Single-byte encodings (such as https://encoding.spec.whatwg.org/ibm866.html) contains a high half of the encoding (because they all are ASCII compatible and entries `00-7F` the same as in ASCII), so the table is always `16 x 8` and represents bytes `80-FF`:
- Header row (`00 01 02 ...`) is a low nibble of an encoded byte (`0x_X`) in hexadecimal form
- Header column (`08 09 0A ...`) is a high nibble of an encoded byte (`0xX_`) in hexadecimal form

Multi-byte encodings (such as https://encoding.spec.whatwg.org/big5.html) are more complicated. All such encodings (which are visualized) occupies 1 or 2 bytes per code point. In most cases only [ASCII code points](https://infra.spec.whatwg.org/#ascii-code-point) occupies 1 byte, so they are not included in visualization, other code points occupies two bytes:
- External header row (grey) is a low byte of the code point in the encoding (`__XX`) in hexadecimal form
- Internal header row (white) is just a row index in hexadecimal form
- External header column (grey) is a high byte of the code point in the encoding (`XX__`) in hexadecimal form
- Internal header column (white) is just a column index in hexadecimal form

Table dimensions depends on the encoding and represents constants that are used in encoding process.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/292#issuecomment-1224280623
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/encoding/issues/292/1224280623@github.com>

Received on Tuesday, 23 August 2022 16:04:17 UTC