Re: [whatwg/encoding] Make TextEncoder and TextDecoder be transform streams (#127)

annevk commented on this pull request.

I have many nits that I'm also willing to help address once you consider things done.

Reading through #72 again it seems that with @jakearchibald you agreed on mode locking, but I cannot find that in the current draft. Did I miss a comment saying why we went back on that?

It would also be nice to add at least one example.

> @@ -28,6 +28,18 @@ Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeo
 spec:infra; type:dfn;
     text:code point
     text:ascii case-insensitive
+spec:streams;
+    type:interface; text:ReadableStream
+    type:dfn; text:chunk
+    type:dfn; text:readable stream
+    type:dfn; text:writable stream

Why are these needed? Do you get some conflict that cannot be resolved?

>    USVString decode(optional BufferSource input, optional TextDecodeOptions options);
 };</pre>
 
 <p>A {{TextDecoder}} object has an associated <dfn for=TextDecoder>encoding</dfn>,
 <dfn for=TextDecoder>decoder</dfn>, <dfn for=TextDecoder>stream</dfn>,
 <dfn for=TextDecoder>ignore BOM flag</dfn> (initially unset),
 <dfn for=TextDecoder>BOM seen flag</dfn> (initially unset),
-<dfn for=TextDecoder>error mode</dfn> (initially "<code>replacement</code>"), and
-<dfn for=TextDecoder>do not flush flag</dfn> (initially unset).
+<dfn for=TextDecoder>error mode</dfn> (initially "<code>replacement</code>"),
+<dfn for=TextDecoder>do not flush flag</dfn> (initially unset), and
+<dfn for=TextDecoder>transform</dfn> (a {{TransformStream}} object).

It seems best to not say anything here between parenthesis as that's only used for initial values currently (which is inconsistent with other specifications, I realize). We could do a follow-up where we state the type for all of these and then put the initial value elsewhere.

> @@ -1135,6 +1150,31 @@ control.
  <dt><code><var>decoder</var> . <a attribute for=TextDecoder>ignoreBOM</a></code>
  <dd><p>Returns true if <a for=TextDecoder>ignore BOM flag</a> is set, and false otherwise.
 
+ <dt><code><var>decoder</var> . <a attribute for=TextDecoder>readable</a></code>
+ <dd>
+  <p>Returns a <a>readable stream</a> whose <a>chunks</a> are strings resulting from running <a

Please don't wrap inline/phrasing elements in any manner.

> @@ -1135,6 +1150,31 @@ control.
  <dt><code><var>decoder</var> . <a attribute for=TextDecoder>ignoreBOM</a></code>
  <dd><p>Returns true if <a for=TextDecoder>ignore BOM flag</a> is set, and false otherwise.
 
+ <dt><code><var>decoder</var> . <a attribute for=TextDecoder>readable</a></code>
+ <dd>
+  <p>Returns a <a>readable stream</a> whose <a>chunks</a> are strings resulting from running <a
+  for=TextDecoder>encoding</a>'s <a for=/>decoder</a> on the chunks written to
+  {{TextDecoder/writable}}.
+
+ <dt><code><var>decoder</var> . <a attribute for=TextDecoder>writable</a></code>
+ <dd>
+  <p>Returns a <a>writable stream</a> which accepts {{BufferSource}} chunks and runs them through <a

Same here.

> + <dd>
+  <p>Returns a <a>writable stream</a> which accepts {{BufferSource}} chunks and runs them through <a
+  for=TextDecoder>encoding</a>'s <a for=/>decoder</a> before making them available to
+  {{TextDecoder/readable}}.
+
+  <p>Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a
+  {{ReadableStream}} source.
+
+  <pre class=example id=example-textdecode-writable><code class=lang-javascript>
+var decoder = new TextDecoder(encoding);
+byteReadable
+  .pipeThrough(decoder)
+  .pipeTo(textWritable);</code></pre>
+
+  <p>If the <a for=TextDecoder>error mode</a> is "<code>fatal</code>" and
+  <a for=TextDecoder>encoding</a>'s <a for=/>decoder</a> returns <a>error</a>, both {{TextDecoder/readable}}

This exceeds 100 columns.

> @@ -1174,6 +1214,20 @@ constructor, when invoked, must run these steps:
  <li><p>If <var>options</var>'s <code>ignoreBOM</code> member is true, then set <var>dec</var>'s
  <a for=TextDecoder>ignore BOM flag</a>.
 
+ <li><p>Let <var>startAlgorithm</var> be an algorithm that takes no arguments and returns nothing.
+
+ <li><p>Let <var>transformAlgorithm</var> be an algorithm which takes a <var>chunk</var> argument
+ and runs the <a>decode and enqueue a chunk</a> algorithm with <var>dec</var> and <var>chunk</var>.
+
+ <li><p>Let <var>flushAlgorithm</var> be an algorithm which takes no arguments and runs the <a>flush

No wrapping inside phrasing elements.

> @@ -1174,6 +1214,20 @@ constructor, when invoked, must run these steps:
  <li><p>If <var>options</var>'s <code>ignoreBOM</code> member is true, then set <var>dec</var>'s
  <a for=TextDecoder>ignore BOM flag</a>.
 
+ <li><p>Let <var>startAlgorithm</var> be an algorithm that takes no arguments and returns nothing.
+
+ <li><p>Let <var>transformAlgorithm</var> be an algorithm which takes a <var>chunk</var> argument
+ and runs the <a>decode and enqueue a chunk</a> algorithm with <var>dec</var> and <var>chunk</var>.
+
+ <li><p>Let <var>flushAlgorithm</var> be an algorithm which takes no arguments and runs the <a>flush
+ and enqueue</a> algorithm with <var>dec</var>.
+
+ <li><p>Let <var>transform</var> be the result of calling <a

No wrapping.

>  <p>The <dfn method for=TextDecoder><code>decode(<var>input</var>, <var>options</var>)</code></dfn>
 method, when invoked, must run these steps:
 
 <ol>
+ <li><p>Let <var>readable</var> be <a for=TextDecoder>transform</a>.\[[readable]].
+
+ <li><p>If <a abstract-op>IsReadableStreamLocked</a>(<var>readable</var>) is true, then throw a
+ {{TypeError}} exception.
+
+ <li><p>Let <var>writable</var> be <a for=TextDecoder>transform</a>.\[[writable]].
+
+ <li><p>If <a abstract-op>IsWritableStreamLocked</a>(<var>writable</var>) is true, then throw a
+ {{TypeError}} exception.
+
+ <p class="note">These steps ensure that the state of the <a for=/>decoder</a> is not simultaneously modified by

Longer than a 100 columns.

> +
+ <li><p>Let <var>output</var> be a new <a for=/>stream</a>.
+
+ <li>
+  <p>While true, run these steps:
+
+  <ol>
+   <li><p>Let <var>token</var> be the result of <a>reading</a> from <var>dec</var>'s <a
+   for=TextDecoder>stream</a>.
+
+   <li>
+    <p>If <var>token</var> is <a>end-of-stream</a>, run these steps:
+    <ol>
+     <li><p>Let <var>outputChunk</var> be <var>output</var>, <a lt="serialize stream">serialized</a>.
+
+     <li><p>Call <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>, <var>outputChunk</var>).

Here you can wrap before `<var>outputChunk</var>).`

> +
+   <li>
+    <p>If <var>token</var> is <a>end-of-stream</a>, run these steps:
+    <ol>
+     <li><p>Let <var>outputChunk</var> be <var>output</var>, <a lt="serialize stream">serialized</a>.
+
+     <li><p>Call <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>, <var>outputChunk</var>).
+
+     <li><p>Return a new promise resolved with undefined.
+    </ol>
+
+   <li><p>Let <var>result</var> be the result of <a>processing</a> <var>token</var> for
+   <var>dec</var>'s <a for=TextDecoder>decoder</a>, <var>dec</var>'s <a for=TextDecoder>stream</a>,
+   <var>output</var>, and <var>dec's</var> <a for=TextDecoder>error mode</a>.
+
+   <li><p>If <var>result</var> is <a>error</a>, return a new promise rejected with a

then return

> +    </ol>
+
+   <li><p>Let <var>result</var> be the result of <a>processing</a> <var>token</var> for
+   <var>dec</var>'s <a for=TextDecoder>decoder</a>, <var>dec</var>'s <a for=TextDecoder>stream</a>,
+   <var>output</var>, and <var>dec's</var> <a for=TextDecoder>error mode</a>.
+
+   <li><p>If <var>result</var> is <a>error</a>, return a new promise rejected with a
+   {{TypeError}} exception.
+  </ol>
+</ol>
+
+<p>The <dfn id=concept-td-flush-and-enqueue>flush and enqueue</dfn> algorithm, which handles the end
+of data from the input {{ReadableStream}}, given a {{TextDecoder}} <var>dec</var>, runs these steps:
+
+<ol>
+ <li><p>If the <a for=TextDecoder>do not flush flag</a> for <var>dec</var> is unset, set

then set*

> + <li><p>Unset <var>dec</var>'s <a for=TextDecoder>do not flush flag</a>.
+
+ <li><p>Let <var>output</var> be a new <a for=/>stream</a>.
+
+ <li><p>Let <var>result</var> be the result of <a>processing</a> <a>end-of-stream</a> for
+ <var>dec</var>'s <a for=TextDecoder>decoder</a> and <var>dec</var>'s <a for=TextDecoder>stream</a>,
+ <var>output</var>, and <var>dec</var>'s <a for=TextDecoder>error mode</a>.
+
+ <li><p>If <var>result</var> is <a>finished</a>, run these steps:
+ <ol>
+  <li><p>Let <var>outputChunk</var> be <var>output</var>, <a lt="serialize stream">serialized</a>.
+
+  <li><p>Let <var>controller</var> be <var>dec</var>'s
+  <a for=TextDecoder>transform</a>.\[[transformStreamController]].
+
+  <li><p>Call <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>, <var>outputChunk</var>).

Needs wrapping.

>    [NewObject] Uint8Array encode(optional USVString input = "");
 };</pre>
 
-<p>A {{TextEncoder}} object has an associated <dfn for=TextEncoder>encoder</dfn>.
+<p>A {{TextEncoder}} object has an associated <dfn for=TextEncoder>encoder</dfn>, <dfn
+for=TextEncoder>transform</dfn> (a {{TransformStream}} object) and <dfn for=TextEncoder>pending high
+surrogate</dfn> (initially unset).

No wrapping inside phrasing elements. This also has the same problem where types and initial values are mixed. I recommend omitting types for now and filing a follow-up issue.

> @@ -1267,6 +1426,24 @@ requires buffering of scalar values.
  <dt><code><var>encoder</var> . <a attribute for=TextEncoder>encoding</a></code>
  <dd><p>Returns "<code>utf-8</code>".
 
+ <dt><code><var>encoder</var> . <a attribute for=TextEncoder>readable</a></code>
+ <dd>
+  <p>Returns a <a>readable stream</a> whose <a>chunks</a> are {{Uint8Array}}s resulting from running
+  <a>UTF-8</a>'s <a for=/>encoder</a> on the chunks written to {{TextEncoder/writable}}.
+
+ <dt><code><var>encoder</var> . <a attribute for=TextEncoder>writable</a></code>
+ <dd>
+  <p>Returns a <a>writable stream</a> which accepts string chunks and runs them through
+  <a>UTF-8</a>'s <a for=/>encoder</a> before making them available to {{TextEncoder/readable}}.
+
+  <p>Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a {{ReadableStream}} source.
+
+

One newline can go here.

> @@ -1279,16 +1456,49 @@ constructor, when invoked, must run these steps:
 
  <li><p>Set <var>enc</var>'s <a for=TextEncoder>encoder</a> to <a>UTF-8</a>'s <a for=/>encoder</a>.
 
+ <li><p>Let <var>startAlgorithm</var> be an algorithm that takes no arguments and returns nothing.
+
+ <li><p>Let <var>transformAlgorithm</var> be an algorithm which takes a <var>chunk</var> argument
+ and runs the <a>encode and enqueue a chunk</a> algorithm with <var>enc</var> and <var>chunk</var>.
+
+ <li><p>Let <var>flushAlgorithm</var> be an algorithm which runs the <a>encode and flush</a>
+ algorithm with <var>enc</var>.
+
+ <li><p>Let <var>transform</var> be the result of calling <a
+ abstract-op>CreateTransformStream</a>(<var>startAlgorithm</var>, <var>transformAlgorithm</var>,
+ <var>flushAlgorithm</var>).
+
+ <li><p>Set <var>enc</var>'s <a for=TextEncoder>transform</a> to <var>transform</var>

Needs a trailing dot.

>   <li><p>Return <var>enc</var>.
 </ol>
 
 <p>The <dfn attribute for=TextEncoder><code>encoding</code></dfn> attribute's getter must return
 "<code>utf-8</code>".
 
+<p>The <dfn attribute for=TextEncoder><code>readable</code></dfn> attribute's getter must return <a

No wrapping phrasing elements.

>   <li><p>Return <var>enc</var>.
 </ol>
 
 <p>The <dfn attribute for=TextEncoder><code>encoding</code></dfn> attribute's getter must return
 "<code>utf-8</code>".
 
+<p>The <dfn attribute for=TextEncoder><code>readable</code></dfn> attribute's getter must return <a
+for=TextEncoder>transform</a>.\[[readable]].
+
+<p>The <dfn attribute for=TextEncoder><code>writable</code></dfn> attribute's getter must return <a

Same.

>  <p>The <dfn method for=TextEncoder><code>encode(<var>input</var>)</code></dfn> method, when invoked,
 must run these steps:
 
 <ol>
+ <li><p>Let <var>readable</var> be <a for=TextEncoder>transform</a>.\[[readable]].
+
+ <li><p>If <a abstract-op>IsReadableStreamLocked</a>(<var>readable</var>) is true, then throw a
+ {{TypeError}} exception.
+
+ <li><p>Let <var>writable</var> be <a for=TextEncoder>transform</a>.\[[writable]].
+
+ <li><p>If <a abstract-op>IsWritableStreamLocked</a>(<var>writable</var>) is true, then throw a
+ {{TypeError}} exception.
+
+ <p class="note">These steps are for consistent behaviour with the {{TextDecoder}} <a method

behavior*

Added to https://wiki.whatwg.org/wiki/Style

> @@ -1314,6 +1524,113 @@ must run these steps:
   </ol>
 </ol>
 
+<p>The <dfn id=concept-te-encode-and-enqueue>encode and enqueue a chunk</dfn> algorithm, given a
+{{TextEncoder}} <var>enc</var> and <var>chunk</var>, runs these steps:
+
+<ol>
+ <li><p>Let <var>input</var> be the result of <a lt="converted to an IDL value">converting</a>
+ <var>chunk</var> to a {{DOMString}}. If this throws an exception, return a promise rejected with

then return

> @@ -1314,6 +1524,113 @@ must run these steps:
   </ol>
 </ol>
 
+<p>The <dfn id=concept-te-encode-and-enqueue>encode and enqueue a chunk</dfn> algorithm, given a
+{{TextEncoder}} <var>enc</var> and <var>chunk</var>, runs these steps:
+
+<ol>
+ <li><p>Let <var>input</var> be the result of <a lt="converted to an IDL value">converting</a>
+ <var>chunk</var> to a {{DOMString}}. If this throws an exception, return a promise rejected with
+ that exception.
+
+ <li><p>Convert <var>input</var> to a <a for=/>stream</a>.
+
+ <li><p>Let <var>output</var> be a new <a for=/>stream</a>.
+
+ <li><p>Let <var>encoder</var> be <a>UTF-8</a>'s <a for=/>encoder</a>.
+
+ <li><p>Let <var>controller</var> be <var>enc</var>'s <a

No wrapping inside phrasing elements.

> +
+ <li><p>Convert <var>input</var> to a <a for=/>stream</a>.
+
+ <li><p>Let <var>output</var> be a new <a for=/>stream</a>.
+
+ <li><p>Let <var>encoder</var> be <a>UTF-8</a>'s <a for=/>encoder</a>.
+
+ <li><p>Let <var>controller</var> be <var>enc</var>'s <a
+ for=TextEncoder>transform</a>.\[[transformStreamController]].
+
+ <li><p>While true, run these steps:
+
+ <ol>
+  <li><p>Let <var>token</var> be the result of <a>reading</a> from <var>input</var>.
+
+  <li><p>If <var>token</var> is <a>end-of-stream</a>, then run these steps:

Since this `<li>` has multiple children the `<p>` needs to go on a newline indented 1 (and the same for the following `<ol>`).

> +
+  <ol>
+   <li><p>Convert <var>output</var> into a byte sequence.
+
+   <li><p>Call <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>,
+   <var>output</var>).
+
+   <li><p>Return a new promise resolved with undefined.
+  </ol>
+
+  <li><p>Let <var>result</var> be the result of executing the <a>convert code unit to scalar
+  value</a> algorithm with <var>enc</var>, <var>token</var> and <var>input</var>.
+
+  <li><p>If <var>result</var> is not <a>continue</a>, <a>process</a> <var>result</var> for
+  <var>encoder</var>, <var>input</var>, <var>output</var>.
+

This newline can go.

> +
+<ol>
+ <li><p>Let <var>input</var> be the result of <a lt="converted to an IDL value">converting</a>
+ <var>chunk</var> to a {{DOMString}}. If this throws an exception, return a promise rejected with
+ that exception.
+
+ <li><p>Convert <var>input</var> to a <a for=/>stream</a>.
+
+ <li><p>Let <var>output</var> be a new <a for=/>stream</a>.
+
+ <li><p>Let <var>encoder</var> be <a>UTF-8</a>'s <a for=/>encoder</a>.
+
+ <li><p>Let <var>controller</var> be <var>enc</var>'s <a
+ for=TextEncoder>transform</a>.\[[transformStreamController]].
+
+ <li><p>While true, run these steps:

Since this `<li>` has multiple children the `<p>` needs to go on a newline indented 1 (and the same for the following `<ol>`).

> +   <li><p>Call <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>,
+   <var>output</var>).
+
+   <li><p>Return a new promise resolved with undefined.
+  </ol>
+
+  <li><p>Let <var>result</var> be the result of executing the <a>convert code unit to scalar
+  value</a> algorithm with <var>enc</var>, <var>token</var> and <var>input</var>.
+
+  <li><p>If <var>result</var> is not <a>continue</a>, <a>process</a> <var>result</var> for
+  <var>encoder</var>, <var>input</var>, <var>output</var>.
+
+ </ol>
+</ol>
+
+

This newline can go. Or if you think that they should be separated even visually use `<hr>`.

> +  <li><p>Let <var>result</var> be the result of executing the <a>convert code unit to scalar
+  value</a> algorithm with <var>enc</var>, <var>token</var> and <var>input</var>.
+
+  <li><p>If <var>result</var> is not <a>continue</a>, <a>process</a> <var>result</var> for
+  <var>encoder</var>, <var>input</var>, <var>output</var>.
+
+ </ol>
+</ol>
+
+
+<p>The <dfn id=concept-te-convert-code-unit-to-scalar-value>convert code unit to scalar value</dfn>
+algorithm, given a {{TextEncoder}} <var>enc</var>, <var>token</var> and <var>input</var> stream,
+runs these steps:
+
+<ol>
+ <li><p>If <var>enc</var>'s <a>pending high surrogate</a> is set, run these steps:

`<li>` with multiple children again.

> +
+ <ol>
+  <li><p>Let <var>high surrogate</var> be <var>enc</var>'s <a>pending high surrogate</a>.
+
+  <li><p>Unset <var>enc</var>'s <a>pending high surrogate</a>.
+
+  <li><p>If <var>token</var> is in the range U+DC00 to U+DFFF, inclusive, return a code point whose
+  value is 0x10000 + ((<var>high surrogate</var> &minus; 0xD800) &lt;&lt; 10) + (<var>token</var>
+  &minus; 0xDC00).
+
+  <li><p><a>Prepend</a> <var>token</var> to <var>input</var>.
+
+  <li><p>Return U+FFFD.
+ </ol>
+
+ <li><p>If <var>token</var> is in the range U+D800 to U+DBFF, inclusive, set <a>pending high

No newline inside a phrasing element.

> +  <li><p>Return U+FFFD.
+ </ol>
+
+ <li><p>If <var>token</var> is in the range U+D800 to U+DBFF, inclusive, set <a>pending high
+ surrogate</a> to <var>token</var> and return <a>continue</a>.
+
+ <li><p>If <var>token</var> is in the range U+DC00 to U+DFFF, inclusive, return U+FFFD.
+
+ <li><p>Return <var>token</var>.
+</ol>
+
+<p class=note>This is equivalent to the <a spec=webidl>convert a DOMString to a sequence of Unicode
+scalar values</a> algorithm from [[WEBIDL]], but allows for surrogate pairs that are split between
+strings.
+
+

No second newline here (or potentially `<hr>`).

> +
+ <li><p>If <var>token</var> is in the range U+DC00 to U+DFFF, inclusive, return U+FFFD.
+
+ <li><p>Return <var>token</var>.
+</ol>
+
+<p class=note>This is equivalent to the <a spec=webidl>convert a DOMString to a sequence of Unicode
+scalar values</a> algorithm from [[WEBIDL]], but allows for surrogate pairs that are split between
+strings.
+
+
+<p>The <dfn id=concept-te-encode-and-flush>encode and flush</dfn> algorithm, given a
+{{TextEncoder}} <var>enc</var>, runs these steps:
+
+<ol>
+ <li><p>If <var>enc</var>'s <a>pending high surrogate</a> is set, run these steps:

`<li>` with multiple children again.

> @@ -1241,6 +1314,88 @@ method, when invoked, must run these steps:
   </ol>
 </ol>
 

Having an `<hr>` here would be good I think, to separate it a bit from the member definitions.

> +  <li><p><a>Prepend</a> <var>token</var> to <var>input</var>.
+
+  <li><p>Return U+FFFD.
+ </ol>
+
+ <li><p>If <var>token</var> is in the range U+D800 to U+DBFF, inclusive, set <a>pending high
+ surrogate</a> to <var>token</var> and return <a>continue</a>.
+
+ <li><p>If <var>token</var> is in the range U+DC00 to U+DFFF, inclusive, return U+FFFD.
+
+ <li><p>Return <var>token</var>.
+</ol>
+
+<p class=note>This is equivalent to the <a spec=webidl>convert a DOMString to a sequence of Unicode
+scalar values</a> algorithm from [[WEBIDL]], but allows for surrogate pairs that are split between
+strings.

I was thinking, it would be better to reference https://infra.spec.whatwg.org/#javascript-string-convert here since eventually we'll update IDL to use that too and then this would need to be updated.

> +
+ </ol>
+</ol>
+
+
+<p>The <dfn id=concept-te-convert-code-unit-to-scalar-value>convert code unit to scalar value</dfn>
+algorithm, given a {{TextEncoder}} <var>enc</var>, <var>token</var> and <var>input</var> stream,
+runs these steps:
+
+<ol>
+ <li><p>If <var>enc</var>'s <a>pending high surrogate</a> is set, run these steps:
+
+ <ol>
+  <li><p>Let <var>high surrogate</var> be <var>enc</var>'s <a>pending high surrogate</a>.
+
+  <li><p>Unset <var>enc</var>'s <a>pending high surrogate</a>.

I think it would be slightly clearer if we instead made pending high surrogate's type "null or a code unit" instead of the current mixture of flag and code unit. So check for non-null and set it to null when done.

> @@ -1314,6 +1524,113 @@ must run these steps:
   </ol>
 </ol>
 

An `<hr>` would be good here too.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/pull/127#pullrequestreview-91102477

Received on Wednesday, 24 January 2018 09:33:27 UTC