Re: [whatwg/encoding] Add a BOM sniffing hook for better integration with HTML (#203)

annevk commented on this pull request.

I wish we had some kind of auto-formatting of source code.

> @@ -853,17 +853,17 @@ different format here, to be able to represent ranges.)
 
 <div class=note>
  <p>The algorithms defined below (<a>decode</a>, <a>UTF-8 decode</a>,
- <a>UTF-8 decode without BOM</a>, <a>UTF-8 decode without BOM or fail</a>, <a for=/>encode</a>, and
- <a>UTF-8 encode</a>) are intended for usage by other standards.
+ <a>UTF-8 decode without BOM</a>, <a>UTF-8 decode without BOM or fail</a>, <a for=/>encode</a>,
+ <a>UTF-8 encode</a> and <a>BOM sniff</a>) are intended for usage by other standards.

```suggestion
 <a>UTF-8 encode</a>, and <a>BOM sniff</a>) are intended for usage by other standards.
```

> + <p>Standards are strongly discouraged from using <a>decode</a>, <a for=/>encode</a> and <a>BOM
+ sniff</a>, except as needed for compatibility.

```suggestion
 <p>Standards are strongly discouraged from using <a>decode</a>, <a for=/>encode</a>, and
 <a>BOM sniff</a>, except as needed for compatibility.
```

> - <a>read</a> returns <a>end-of-stream</a>.
-
- <li>
-  <p>For each of the rows in the table below, starting with the first
-  one and going down, if the first bytes of <var>buffer</var> match
-  all the bytes given in the first column, then set <var>encoding</var>
-  to the <a for=/>encoding</a> given in the cell in the second column of
-  that row and <var>BOMSeen</var> to true.
-
-  <table>
-   <tbody><tr><th>Byte order mark<th>Encoding
-   <tr><td>0xEF 0xBB 0xBF<td><a>UTF-8</a>
-   <tr><td>0xFE 0xFF<td><a>UTF-16BE</a>
-   <tr><td>0xFF 0xFE<td><a>UTF-16LE</a>
-  </table>
+ <li><p><a>BOM sniff</a> <var>stream</var>. If the result is not null, set <var>encoding</var> to

Let's make this something like  Let `BOMEncoding` be the result of BOM sniffing stream and then drop the BOMSeen variable. And override `encoding` in a subsequent step.

> - <a>read</a> returns <a>end-of-stream</a>.
-
- <li>
-  <p>For each of the rows in the table below, starting with the first
-  one and going down, if the first bytes of <var>buffer</var> match
-  all the bytes given in the first column, then set <var>encoding</var>
-  to the <a for=/>encoding</a> given in the cell in the second column of
-  that row and <var>BOMSeen</var> to true.
-
-  <table>
-   <tbody><tr><th>Byte order mark<th>Encoding
-   <tr><td>0xEF 0xBB 0xBF<td><a>UTF-8</a>
-   <tr><td>0xFE 0xFF<td><a>UTF-16BE</a>
-   <tr><td>0xFF 0xFE<td><a>UTF-16LE</a>
-  </table>
+ <li><p><a>BOM sniff</a> <var>stream</var>. If the result is not null, set <var>encoding</var> to

Also, the p will need to be on its own line as the li has two children.

> @@ -995,7 +971,34 @@ steps:
 <p>To <dfn export>UTF-8 encode</dfn> a scalar value stream <var>stream</var>, return the result of
 <a lt=encode for=/>encoding</a> <var>stream</var> using encoding <a>UTF-8</a>.
 
+<hr>
+
+<p>To <dfn export>BOM sniff</dfn> a byte stream <var>stream</var>, run these steps:
+
+<ol>
+  <li><p>Wait until <var>stream</var> has three bytes available or the <a>end-of-stream</a> has been

One space for markup indentation here and below. (And more indentation for the children of the table, to match what it was before.

>  
+<p class=note>This hook is a workaround for the fact that <a>decode</a> has no way to communicate
+back to the caller that it has found a byte order mark and is therefore not using the provided
+encoding. The hook is to be invoked before <a>decode</a>, and it will return an encoding
+corresponding to the byte order mark found, or null otherwise.

You need to add some newlines here to preserve the original number of newlines before a heading.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/pull/203#pullrequestreview-377731961

Received on Thursday, 19 March 2020 13:53:48 UTC