Re: [whatwg/encoding] Add TextEncoderStream and TextDecoderStream transform streams (#149)

domenic commented on this pull request.

So sorry on the delay on this. Will be more responsive in the future.

I left editorial comments about spec organization. The algorithms seem solid to me from the streams perspective, but I am not an expert on the encoding side of things.

Overall this looks pretty good. Hopefully @annevk can review as well.

> @@ -15,6 +15,19 @@ Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeo
 spec:infra; type:dfn;
     text:code point
     text:ascii case-insensitive
+spec:streams;
+    type:interface; text:ReadableStream
+    type:dfn; text:chunk

We should export these in streams so that this is not necessary, for the dfns at least. For the interface, that's a separate issue: https://github.com/whatwg/fetch/issues/780 so we can leave it here for now.

> +
+<pre class=idl>
+interface mixin TextDecoderAttributes {
+  readonly attribute DOMString encoding;
+  readonly attribute boolean fatal;
+  readonly attribute boolean ignoreBOM;
+};
+</pre>
+
+<p>The {{TextDecoderAttributes}} mixin defines common attributes that are shared between
+{{TextDecoder}} and {{TextDecoderStream}}. These objects have an associated
+<dfn for=TextDecoderAttributes>encoding</dfn>, <dfn for=TextDecoderAttributes>ignore BOM flag</dfn>,
+and <dfn for=TextDecoderAttributes>error mode</dfn>.
+
+<p>The <dfn attribute for=TextDecoderAttributes><code>encoding</code></dfn> attribute's getter must
+return <a for=TextDecoderAttributes>encoding</a>'s <a for=encoding>name</a> in <a>ASCII

While moving things around maybe use the modern phrasing "this object's encoding's" or "this TextDecoderAttribute object's encoding's". Similarly for the rest.

> @@ -1038,6 +1051,33 @@ function decodeArrayOfStrings(buffer, encoding) {
 </div>
 
 
+<h3 id=interface-mixin-textdecoderattributes>Interface Mixin {{TextDecoderAttributes}}</h3>

Nit: lowercase "mixin" in headings, here and below. (Headings are sentence case.)

> @@ -1305,6 +1350,404 @@ must run these steps:
 </ol>
 
 
+<h3 id=interface-mixin-generictransformstream>Interface Mixin {{GenericTransformStream}}</h3>
+
+<p>{{GenericTransformStream}} represents the concept of a <a>transform stream</a> in IDL. It is not
+a {{TransformStream}}, though it has the same interface and it delegates to one.
+
+<pre class=idl>
+interface mixin GenericTransformStream {
+  readonly attribute ReadableStream readable;
+  readonly attribute WritableStream writable;
+};
+</pre>
+
+<p>An object that includes GenericTransformStream always has an associated
+<dfn for=GenericTransformStream>transform</dfn>.

Probably worth stating its type.

> @@ -1305,6 +1350,404 @@ must run these steps:
 </ol>
 
 
+<h3 id=interface-mixin-generictransformstream>Interface Mixin {{GenericTransformStream}}</h3>
+
+<p>{{GenericTransformStream}} represents the concept of a <a>transform stream</a> in IDL. It is not
+a {{TransformStream}}, though it has the same interface and it delegates to one.
+
+<pre class=idl>
+interface mixin GenericTransformStream {
+  readonly attribute ReadableStream readable;
+  readonly attribute WritableStream writable;
+};
+</pre>
+
+<p>An object that includes GenericTransformStream always has an associated

`{{GenericTransformStream}}`

> + Exposed=(Window,Worker)]
+interface TextDecoderStream {
+  USVString decode(optional BufferSource input, optional TextDecodeOptions options);
+};
+
+TextDecoderStream includes TextDecoderAttributes;
+TextDecoderStream includes GenericTransformStream;
+</pre>
+
+<p>A {{TextDecoderStream}} object has an associated <dfn for=TextDecoderStream>encoding</dfn>,
+<dfn for=TextDecoderStream>decoder</dfn>, <dfn for=TextDecoderStream>stream</dfn>,
+<dfn for=TextDecoderStream>ignore BOM flag</dfn> (initially unset),
+<dfn for=TextDecoderStream>BOM seen flag</dfn> (initially unset),
+<dfn for=TextDecoderStream>error mode</dfn> (initially "<code>replacement</code>"),
+<dfn for=TextDecoderStream>do not flush flag</dfn> (initially unset), and
+<dfn for=TextDecoderStream>transform</dfn>.

For those things provided by the mixins, you should use the mixin `dfn`s, and for=TextDecoderAttributes on references to them. No need to have separate dfns/internal slots for the TextDecoder and TextDecoderStream themselves.

> +
+TextDecoderStream includes TextDecoderAttributes;
+TextDecoderStream includes GenericTransformStream;
+</pre>
+
+<p>A {{TextDecoderStream}} object has an associated <dfn for=TextDecoderStream>encoding</dfn>,
+<dfn for=TextDecoderStream>decoder</dfn>, <dfn for=TextDecoderStream>stream</dfn>,
+<dfn for=TextDecoderStream>ignore BOM flag</dfn> (initially unset),
+<dfn for=TextDecoderStream>BOM seen flag</dfn> (initially unset),
+<dfn for=TextDecoderStream>error mode</dfn> (initially "<code>replacement</code>"),
+<dfn for=TextDecoderStream>do not flush flag</dfn> (initially unset), and
+<dfn for=TextDecoderStream>transform</dfn>.
+
+<p>A {{TextDecoderStream}} object also has an associated <dfn id=concept-tds-serialize
+for=TextDecoderStream>serialize stream</dfn> algorithm, that is identical to the <a
+for=TextDecoder>serialize stream</a> algorithm for the equivalent {{TextDecoder}}.

Can this move to TextDecoderAttributes since it's shared? Not sure myself.

> + <dt><code><var>decoder</var> . <a attribute for=GenericTransformStream>readable</a></code>
+ <dd>
+  <p>Returns a <a>readable stream</a> whose <a>chunks</a> are strings resulting from running
+  <a for=TextDecoderStream>encoding</a>'s <a for=/>decoder</a> on the chunks written to
+  {{GenericTransformStream/writable}}.
+
+ <dt><code><var>decoder</var> . <a attribute for=GenericTransformStream>writable</a></code>
+ <dd>
+  <p>Returns a <a>writable stream</a> which accepts {{BufferSource}} chunks and runs them through
+  <a for=TextDecoderStream>encoding</a>'s <a for=/>decoder</a> before making them available to
+  {{GenericTransformStream/readable}}.
+
+  <p>Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a
+  {{ReadableStream}} source.
+
+  <pre class=example id=example-textdecoderstream-writable><code class=lang-javascript>

FWIW in Bikeshed you can use any indentation/whitespacing you want.

> + <li><p>Set <var>dec</var>'s <a for=TextDecoderStream>transform</a> to <var>transform</var>.
+
+ <li><p>Return <var>dec</var>.
+</ol>
+
+<p>The <dfn id=concept-tds-decode-and-enqueue>decode and enqueue a chunk</dfn> algorithm, given a
+{{TextDecoderStream}} <var>dec</var> and a <var>chunk</var>, runs these steps:
+
+<ol>
+ <li><p>If <a>Type</a>(<var>chunk</var>) is not Object, or <var>chunk</var> does not have an
+ \[[ArrayBufferData]] internal slot, or <a abstract-op>IsDetachedBuffer</a>(<var>chunk</var>) is
+ true, or <a abstract-op>IsSharedArrayBuffer</a>(<var>chunk</var>) is true, then return a new
+ promise rejected with a {{TypeError}} exception.
+
+ <li><p><a>Push</a> a <a lt="get a copy of the buffer source">copy of</a> <var>chunk</var> to
+ <var>dec</var>'s <a for=TextDecoderStream>stream</a>.

Hmm one day we should do a transfer variant, of this and all other web platform APIs...

> + Exposed=(Window,Worker)]
+interface TextEncoderStream {
+};
+
+TextEncoderStream includes TextEncoderAttributes;
+TextEncoderStream includes GenericTransformStream;
+</pre>
+
+<p>A {{TextEncoderStream}} object has an associated <dfn for=TextEncoderStream>encoder</dfn>,
+<dfn for=TextEncoderStream>transform</dfn> and <dfn for=TextEncoderStream>pending high
+surrogate</dfn> (initially null).
+
+<p class="note no-backref">A {{TextEncoderStream}} object offers no <var>label</var> argument as it
+only supports <a>UTF-8</a>.
+
+<hr>

Not so sure about this `<hr>`

> +   <li><p><a>Prepend</a> <var>token</var> to <var>input</var>.
+
+   <li><p>Return U+FFFD.
+  </ol>
+
+ <li><p>If <var>token</var> is in the range U+D800 to U+DBFF, inclusive, then set <a>pending high
+ surrogate</a> to <var>token</var> and return <a>continue</a>.
+
+ <li><p>If <var>token</var> is in the range U+DC00 to U+DFFF, inclusive, then return U+FFFD.
+
+ <li><p>Return <var>token</var>.
+</ol>
+
+<p class=note>This is equivalent to the "<a>convert</a> a <a>JavaScript string</a> into a <a>scalar
+value string</a>" algorithm from the Infra Standard, but allows for surrogate pairs that are split
+between strings.

Add `[[INFRA]]` for a nice stylistic flourish

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/pull/149#pullrequestreview-142027900

Received on Tuesday, 31 July 2018 16:20:02 UTC