- From: Anne van Kesteren <notifications@github.com>
- Date: Thu, 19 Mar 2020 06:53:35 -0700
- To: whatwg/encoding <encoding@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/encoding/pull/203/review/377731961@github.com>
annevk commented on this pull request. I wish we had some kind of auto-formatting of source code. > @@ -853,17 +853,17 @@ different format here, to be able to represent ranges.) <div class=note> <p>The algorithms defined below (<a>decode</a>, <a>UTF-8 decode</a>, - <a>UTF-8 decode without BOM</a>, <a>UTF-8 decode without BOM or fail</a>, <a for=/>encode</a>, and - <a>UTF-8 encode</a>) are intended for usage by other standards. + <a>UTF-8 decode without BOM</a>, <a>UTF-8 decode without BOM or fail</a>, <a for=/>encode</a>, + <a>UTF-8 encode</a> and <a>BOM sniff</a>) are intended for usage by other standards. ```suggestion <a>UTF-8 encode</a>, and <a>BOM sniff</a>) are intended for usage by other standards. ``` > + <p>Standards are strongly discouraged from using <a>decode</a>, <a for=/>encode</a> and <a>BOM + sniff</a>, except as needed for compatibility. ```suggestion <p>Standards are strongly discouraged from using <a>decode</a>, <a for=/>encode</a>, and <a>BOM sniff</a>, except as needed for compatibility. ``` > - <a>read</a> returns <a>end-of-stream</a>. - - <li> - <p>For each of the rows in the table below, starting with the first - one and going down, if the first bytes of <var>buffer</var> match - all the bytes given in the first column, then set <var>encoding</var> - to the <a for=/>encoding</a> given in the cell in the second column of - that row and <var>BOMSeen</var> to true. - - <table> - <tbody><tr><th>Byte order mark<th>Encoding - <tr><td>0xEF 0xBB 0xBF<td><a>UTF-8</a> - <tr><td>0xFE 0xFF<td><a>UTF-16BE</a> - <tr><td>0xFF 0xFE<td><a>UTF-16LE</a> - </table> + <li><p><a>BOM sniff</a> <var>stream</var>. If the result is not null, set <var>encoding</var> to Let's make this something like Let `BOMEncoding` be the result of BOM sniffing stream and then drop the BOMSeen variable. And override `encoding` in a subsequent step. > - <a>read</a> returns <a>end-of-stream</a>. - - <li> - <p>For each of the rows in the table below, starting with the first - one and going down, if the first bytes of <var>buffer</var> match - all the bytes given in the first column, then set <var>encoding</var> - to the <a for=/>encoding</a> given in the cell in the second column of - that row and <var>BOMSeen</var> to true. - - <table> - <tbody><tr><th>Byte order mark<th>Encoding - <tr><td>0xEF 0xBB 0xBF<td><a>UTF-8</a> - <tr><td>0xFE 0xFF<td><a>UTF-16BE</a> - <tr><td>0xFF 0xFE<td><a>UTF-16LE</a> - </table> + <li><p><a>BOM sniff</a> <var>stream</var>. If the result is not null, set <var>encoding</var> to Also, the p will need to be on its own line as the li has two children. > @@ -995,7 +971,34 @@ steps: <p>To <dfn export>UTF-8 encode</dfn> a scalar value stream <var>stream</var>, return the result of <a lt=encode for=/>encoding</a> <var>stream</var> using encoding <a>UTF-8</a>. +<hr> + +<p>To <dfn export>BOM sniff</dfn> a byte stream <var>stream</var>, run these steps: + +<ol> + <li><p>Wait until <var>stream</var> has three bytes available or the <a>end-of-stream</a> has been One space for markup indentation here and below. (And more indentation for the children of the table, to match what it was before. > +<p class=note>This hook is a workaround for the fact that <a>decode</a> has no way to communicate +back to the caller that it has found a byte order mark and is therefore not using the provided +encoding. The hook is to be invoked before <a>decode</a>, and it will return an encoding +corresponding to the byte order mark found, or null otherwise. You need to add some newlines here to preserve the original number of newlines before a heading. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/pull/203#pullrequestreview-377731961
Received on Thursday, 19 March 2020 13:53:48 UTC