- From: Mingun <notifications@github.com>
- Date: Tue, 23 Aug 2022 09:04:04 -0700
- To: whatwg/encoding <encoding@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/encoding/issues/292/1224280623@github.com>
I actually found answers for some my questions myself during understanding the `indexes.json`, but that would be nice to have them answered on visualization pages: "BMP coverage" pages (such as https://encoding.spec.whatwg.org/big5-bmp.html) contains a table 256x256 with following information: - External header row (`00 01 02 ...`) is a low byte of the code point `U+__XX` in decimal form - Internal header row contains the same but in hexadecimal - External header column (`00 01 02 ...`) is a high byte of the code point `U+XX__` in decimal form - Internal header column contains the same but in hexadecimal - Each cell contains - Code point value at the top in form `U+xxxx` - Glyph at the middle or glyph for `U+FFFD` (Replacement character) if that cell does not contain any mapped code point - position in the index (array index in JSON array of code points) which is called [_pointer_](https://encoding.spec.whatwg.org/#index-pointer) in the specification at the bottom The table represents `256 x 256 = 0xFFFF` characters from Basic Multilingual Plane (who would doubt). ---- "Index" pages contains a tables with slightly different structure depending on the encoding with following information: - Each cell contains - position in the index (array index in JSON array of code points) which is called [_pointer_](https://encoding.spec.whatwg.org/#index-pointer) in the specification - Glyph at the middle or glyph for `U+FFFD` (Replacement character) if that cell does not contain any mapped code point - Code point value at the bottom in form `U+xxxx` if cell represents a mapped value Single-byte encodings (such as https://encoding.spec.whatwg.org/ibm866.html) contains a high half of the encoding (because they all are ASCII compatible and entries `00-7F` the same as in ASCII), so the table is always `16 x 8` and represents bytes `80-FF`: - Header row (`00 01 02 ...`) is a low nibble of an encoded byte (`0x_X`) in hexadecimal form - Header column (`08 09 0A ...`) is a high nibble of an encoded byte (`0xX_`) in hexadecimal form Multi-byte encodings (such as https://encoding.spec.whatwg.org/big5.html) are more complicated. All such encodings (which are visualized) occupies 1 or 2 bytes per code point. In most cases only [ASCII code points](https://infra.spec.whatwg.org/#ascii-code-point) occupies 1 byte, so they are not included in visualization, other code points occupies two bytes: - External header row (grey) is a low byte of the code point in the encoding (`__XX`) in hexadecimal form - Internal header row (white) is just a row index in hexadecimal form - External header column (grey) is a high byte of the code point in the encoding (`XX__`) in hexadecimal form - Internal header column (white) is just a column index in hexadecimal form Table dimensions depends on the encoding and represents constants that are used in encoding process. -- Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/issues/292#issuecomment-1224280623 You are receiving this because you are subscribed to this thread. Message ID: <whatwg/encoding/issues/292/1224280623@github.com>
Received on Tuesday, 23 August 2022 16:04:17 UTC