- From: Anne van Kesteren <notifications@github.com>
- Date: Wed, 24 Jan 2018 01:33:04 -0800
- To: whatwg/encoding <encoding@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/encoding/pull/127/review/91102477@github.com>
annevk commented on this pull request. I have many nits that I'm also willing to help address once you consider things done. Reading through #72 again it seems that with @jakearchibald you agreed on mode locking, but I cannot find that in the current draft. Did I miss a comment saying why we went back on that? It would also be nice to add at least one example. > @@ -28,6 +28,18 @@ Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeo spec:infra; type:dfn; text:code point text:ascii case-insensitive +spec:streams; + type:interface; text:ReadableStream + type:dfn; text:chunk + type:dfn; text:readable stream + type:dfn; text:writable stream Why are these needed? Do you get some conflict that cannot be resolved? > USVString decode(optional BufferSource input, optional TextDecodeOptions options); };</pre> <p>A {{TextDecoder}} object has an associated <dfn for=TextDecoder>encoding</dfn>, <dfn for=TextDecoder>decoder</dfn>, <dfn for=TextDecoder>stream</dfn>, <dfn for=TextDecoder>ignore BOM flag</dfn> (initially unset), <dfn for=TextDecoder>BOM seen flag</dfn> (initially unset), -<dfn for=TextDecoder>error mode</dfn> (initially "<code>replacement</code>"), and -<dfn for=TextDecoder>do not flush flag</dfn> (initially unset). +<dfn for=TextDecoder>error mode</dfn> (initially "<code>replacement</code>"), +<dfn for=TextDecoder>do not flush flag</dfn> (initially unset), and +<dfn for=TextDecoder>transform</dfn> (a {{TransformStream}} object). It seems best to not say anything here between parenthesis as that's only used for initial values currently (which is inconsistent with other specifications, I realize). We could do a follow-up where we state the type for all of these and then put the initial value elsewhere. > @@ -1135,6 +1150,31 @@ control. <dt><code><var>decoder</var> . <a attribute for=TextDecoder>ignoreBOM</a></code> <dd><p>Returns true if <a for=TextDecoder>ignore BOM flag</a> is set, and false otherwise. + <dt><code><var>decoder</var> . <a attribute for=TextDecoder>readable</a></code> + <dd> + <p>Returns a <a>readable stream</a> whose <a>chunks</a> are strings resulting from running <a Please don't wrap inline/phrasing elements in any manner. > @@ -1135,6 +1150,31 @@ control. <dt><code><var>decoder</var> . <a attribute for=TextDecoder>ignoreBOM</a></code> <dd><p>Returns true if <a for=TextDecoder>ignore BOM flag</a> is set, and false otherwise. + <dt><code><var>decoder</var> . <a attribute for=TextDecoder>readable</a></code> + <dd> + <p>Returns a <a>readable stream</a> whose <a>chunks</a> are strings resulting from running <a + for=TextDecoder>encoding</a>'s <a for=/>decoder</a> on the chunks written to + {{TextDecoder/writable}}. + + <dt><code><var>decoder</var> . <a attribute for=TextDecoder>writable</a></code> + <dd> + <p>Returns a <a>writable stream</a> which accepts {{BufferSource}} chunks and runs them through <a Same here. > + <dd> + <p>Returns a <a>writable stream</a> which accepts {{BufferSource}} chunks and runs them through <a + for=TextDecoder>encoding</a>'s <a for=/>decoder</a> before making them available to + {{TextDecoder/readable}}. + + <p>Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a + {{ReadableStream}} source. + + <pre class=example id=example-textdecode-writable><code class=lang-javascript> +var decoder = new TextDecoder(encoding); +byteReadable + .pipeThrough(decoder) + .pipeTo(textWritable);</code></pre> + + <p>If the <a for=TextDecoder>error mode</a> is "<code>fatal</code>" and + <a for=TextDecoder>encoding</a>'s <a for=/>decoder</a> returns <a>error</a>, both {{TextDecoder/readable}} This exceeds 100 columns. > @@ -1174,6 +1214,20 @@ constructor, when invoked, must run these steps: <li><p>If <var>options</var>'s <code>ignoreBOM</code> member is true, then set <var>dec</var>'s <a for=TextDecoder>ignore BOM flag</a>. + <li><p>Let <var>startAlgorithm</var> be an algorithm that takes no arguments and returns nothing. + + <li><p>Let <var>transformAlgorithm</var> be an algorithm which takes a <var>chunk</var> argument + and runs the <a>decode and enqueue a chunk</a> algorithm with <var>dec</var> and <var>chunk</var>. + + <li><p>Let <var>flushAlgorithm</var> be an algorithm which takes no arguments and runs the <a>flush No wrapping inside phrasing elements. > @@ -1174,6 +1214,20 @@ constructor, when invoked, must run these steps: <li><p>If <var>options</var>'s <code>ignoreBOM</code> member is true, then set <var>dec</var>'s <a for=TextDecoder>ignore BOM flag</a>. + <li><p>Let <var>startAlgorithm</var> be an algorithm that takes no arguments and returns nothing. + + <li><p>Let <var>transformAlgorithm</var> be an algorithm which takes a <var>chunk</var> argument + and runs the <a>decode and enqueue a chunk</a> algorithm with <var>dec</var> and <var>chunk</var>. + + <li><p>Let <var>flushAlgorithm</var> be an algorithm which takes no arguments and runs the <a>flush + and enqueue</a> algorithm with <var>dec</var>. + + <li><p>Let <var>transform</var> be the result of calling <a No wrapping. > <p>The <dfn method for=TextDecoder><code>decode(<var>input</var>, <var>options</var>)</code></dfn> method, when invoked, must run these steps: <ol> + <li><p>Let <var>readable</var> be <a for=TextDecoder>transform</a>.\[[readable]]. + + <li><p>If <a abstract-op>IsReadableStreamLocked</a>(<var>readable</var>) is true, then throw a + {{TypeError}} exception. + + <li><p>Let <var>writable</var> be <a for=TextDecoder>transform</a>.\[[writable]]. + + <li><p>If <a abstract-op>IsWritableStreamLocked</a>(<var>writable</var>) is true, then throw a + {{TypeError}} exception. + + <p class="note">These steps ensure that the state of the <a for=/>decoder</a> is not simultaneously modified by Longer than a 100 columns. > + + <li><p>Let <var>output</var> be a new <a for=/>stream</a>. + + <li> + <p>While true, run these steps: + + <ol> + <li><p>Let <var>token</var> be the result of <a>reading</a> from <var>dec</var>'s <a + for=TextDecoder>stream</a>. + + <li> + <p>If <var>token</var> is <a>end-of-stream</a>, run these steps: + <ol> + <li><p>Let <var>outputChunk</var> be <var>output</var>, <a lt="serialize stream">serialized</a>. + + <li><p>Call <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>, <var>outputChunk</var>). Here you can wrap before `<var>outputChunk</var>).` > + + <li> + <p>If <var>token</var> is <a>end-of-stream</a>, run these steps: + <ol> + <li><p>Let <var>outputChunk</var> be <var>output</var>, <a lt="serialize stream">serialized</a>. + + <li><p>Call <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>, <var>outputChunk</var>). + + <li><p>Return a new promise resolved with undefined. + </ol> + + <li><p>Let <var>result</var> be the result of <a>processing</a> <var>token</var> for + <var>dec</var>'s <a for=TextDecoder>decoder</a>, <var>dec</var>'s <a for=TextDecoder>stream</a>, + <var>output</var>, and <var>dec's</var> <a for=TextDecoder>error mode</a>. + + <li><p>If <var>result</var> is <a>error</a>, return a new promise rejected with a then return > + </ol> + + <li><p>Let <var>result</var> be the result of <a>processing</a> <var>token</var> for + <var>dec</var>'s <a for=TextDecoder>decoder</a>, <var>dec</var>'s <a for=TextDecoder>stream</a>, + <var>output</var>, and <var>dec's</var> <a for=TextDecoder>error mode</a>. + + <li><p>If <var>result</var> is <a>error</a>, return a new promise rejected with a + {{TypeError}} exception. + </ol> +</ol> + +<p>The <dfn id=concept-td-flush-and-enqueue>flush and enqueue</dfn> algorithm, which handles the end +of data from the input {{ReadableStream}}, given a {{TextDecoder}} <var>dec</var>, runs these steps: + +<ol> + <li><p>If the <a for=TextDecoder>do not flush flag</a> for <var>dec</var> is unset, set then set* > + <li><p>Unset <var>dec</var>'s <a for=TextDecoder>do not flush flag</a>. + + <li><p>Let <var>output</var> be a new <a for=/>stream</a>. + + <li><p>Let <var>result</var> be the result of <a>processing</a> <a>end-of-stream</a> for + <var>dec</var>'s <a for=TextDecoder>decoder</a> and <var>dec</var>'s <a for=TextDecoder>stream</a>, + <var>output</var>, and <var>dec</var>'s <a for=TextDecoder>error mode</a>. + + <li><p>If <var>result</var> is <a>finished</a>, run these steps: + <ol> + <li><p>Let <var>outputChunk</var> be <var>output</var>, <a lt="serialize stream">serialized</a>. + + <li><p>Let <var>controller</var> be <var>dec</var>'s + <a for=TextDecoder>transform</a>.\[[transformStreamController]]. + + <li><p>Call <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>, <var>outputChunk</var>). Needs wrapping. > [NewObject] Uint8Array encode(optional USVString input = ""); };</pre> -<p>A {{TextEncoder}} object has an associated <dfn for=TextEncoder>encoder</dfn>. +<p>A {{TextEncoder}} object has an associated <dfn for=TextEncoder>encoder</dfn>, <dfn +for=TextEncoder>transform</dfn> (a {{TransformStream}} object) and <dfn for=TextEncoder>pending high +surrogate</dfn> (initially unset). No wrapping inside phrasing elements. This also has the same problem where types and initial values are mixed. I recommend omitting types for now and filing a follow-up issue. > @@ -1267,6 +1426,24 @@ requires buffering of scalar values. <dt><code><var>encoder</var> . <a attribute for=TextEncoder>encoding</a></code> <dd><p>Returns "<code>utf-8</code>". + <dt><code><var>encoder</var> . <a attribute for=TextEncoder>readable</a></code> + <dd> + <p>Returns a <a>readable stream</a> whose <a>chunks</a> are {{Uint8Array}}s resulting from running + <a>UTF-8</a>'s <a for=/>encoder</a> on the chunks written to {{TextEncoder/writable}}. + + <dt><code><var>encoder</var> . <a attribute for=TextEncoder>writable</a></code> + <dd> + <p>Returns a <a>writable stream</a> which accepts string chunks and runs them through + <a>UTF-8</a>'s <a for=/>encoder</a> before making them available to {{TextEncoder/readable}}. + + <p>Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a {{ReadableStream}} source. + + One newline can go here. > @@ -1279,16 +1456,49 @@ constructor, when invoked, must run these steps: <li><p>Set <var>enc</var>'s <a for=TextEncoder>encoder</a> to <a>UTF-8</a>'s <a for=/>encoder</a>. + <li><p>Let <var>startAlgorithm</var> be an algorithm that takes no arguments and returns nothing. + + <li><p>Let <var>transformAlgorithm</var> be an algorithm which takes a <var>chunk</var> argument + and runs the <a>encode and enqueue a chunk</a> algorithm with <var>enc</var> and <var>chunk</var>. + + <li><p>Let <var>flushAlgorithm</var> be an algorithm which runs the <a>encode and flush</a> + algorithm with <var>enc</var>. + + <li><p>Let <var>transform</var> be the result of calling <a + abstract-op>CreateTransformStream</a>(<var>startAlgorithm</var>, <var>transformAlgorithm</var>, + <var>flushAlgorithm</var>). + + <li><p>Set <var>enc</var>'s <a for=TextEncoder>transform</a> to <var>transform</var> Needs a trailing dot. > <li><p>Return <var>enc</var>. </ol> <p>The <dfn attribute for=TextEncoder><code>encoding</code></dfn> attribute's getter must return "<code>utf-8</code>". +<p>The <dfn attribute for=TextEncoder><code>readable</code></dfn> attribute's getter must return <a No wrapping phrasing elements. > <li><p>Return <var>enc</var>. </ol> <p>The <dfn attribute for=TextEncoder><code>encoding</code></dfn> attribute's getter must return "<code>utf-8</code>". +<p>The <dfn attribute for=TextEncoder><code>readable</code></dfn> attribute's getter must return <a +for=TextEncoder>transform</a>.\[[readable]]. + +<p>The <dfn attribute for=TextEncoder><code>writable</code></dfn> attribute's getter must return <a Same. > <p>The <dfn method for=TextEncoder><code>encode(<var>input</var>)</code></dfn> method, when invoked, must run these steps: <ol> + <li><p>Let <var>readable</var> be <a for=TextEncoder>transform</a>.\[[readable]]. + + <li><p>If <a abstract-op>IsReadableStreamLocked</a>(<var>readable</var>) is true, then throw a + {{TypeError}} exception. + + <li><p>Let <var>writable</var> be <a for=TextEncoder>transform</a>.\[[writable]]. + + <li><p>If <a abstract-op>IsWritableStreamLocked</a>(<var>writable</var>) is true, then throw a + {{TypeError}} exception. + + <p class="note">These steps are for consistent behaviour with the {{TextDecoder}} <a method behavior* Added to https://wiki.whatwg.org/wiki/Style > @@ -1314,6 +1524,113 @@ must run these steps: </ol> </ol> +<p>The <dfn id=concept-te-encode-and-enqueue>encode and enqueue a chunk</dfn> algorithm, given a +{{TextEncoder}} <var>enc</var> and <var>chunk</var>, runs these steps: + +<ol> + <li><p>Let <var>input</var> be the result of <a lt="converted to an IDL value">converting</a> + <var>chunk</var> to a {{DOMString}}. If this throws an exception, return a promise rejected with then return > @@ -1314,6 +1524,113 @@ must run these steps: </ol> </ol> +<p>The <dfn id=concept-te-encode-and-enqueue>encode and enqueue a chunk</dfn> algorithm, given a +{{TextEncoder}} <var>enc</var> and <var>chunk</var>, runs these steps: + +<ol> + <li><p>Let <var>input</var> be the result of <a lt="converted to an IDL value">converting</a> + <var>chunk</var> to a {{DOMString}}. If this throws an exception, return a promise rejected with + that exception. + + <li><p>Convert <var>input</var> to a <a for=/>stream</a>. + + <li><p>Let <var>output</var> be a new <a for=/>stream</a>. + + <li><p>Let <var>encoder</var> be <a>UTF-8</a>'s <a for=/>encoder</a>. + + <li><p>Let <var>controller</var> be <var>enc</var>'s <a No wrapping inside phrasing elements. > + + <li><p>Convert <var>input</var> to a <a for=/>stream</a>. + + <li><p>Let <var>output</var> be a new <a for=/>stream</a>. + + <li><p>Let <var>encoder</var> be <a>UTF-8</a>'s <a for=/>encoder</a>. + + <li><p>Let <var>controller</var> be <var>enc</var>'s <a + for=TextEncoder>transform</a>.\[[transformStreamController]]. + + <li><p>While true, run these steps: + + <ol> + <li><p>Let <var>token</var> be the result of <a>reading</a> from <var>input</var>. + + <li><p>If <var>token</var> is <a>end-of-stream</a>, then run these steps: Since this `<li>` has multiple children the `<p>` needs to go on a newline indented 1 (and the same for the following `<ol>`). > + + <ol> + <li><p>Convert <var>output</var> into a byte sequence. + + <li><p>Call <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>, + <var>output</var>). + + <li><p>Return a new promise resolved with undefined. + </ol> + + <li><p>Let <var>result</var> be the result of executing the <a>convert code unit to scalar + value</a> algorithm with <var>enc</var>, <var>token</var> and <var>input</var>. + + <li><p>If <var>result</var> is not <a>continue</a>, <a>process</a> <var>result</var> for + <var>encoder</var>, <var>input</var>, <var>output</var>. + This newline can go. > + +<ol> + <li><p>Let <var>input</var> be the result of <a lt="converted to an IDL value">converting</a> + <var>chunk</var> to a {{DOMString}}. If this throws an exception, return a promise rejected with + that exception. + + <li><p>Convert <var>input</var> to a <a for=/>stream</a>. + + <li><p>Let <var>output</var> be a new <a for=/>stream</a>. + + <li><p>Let <var>encoder</var> be <a>UTF-8</a>'s <a for=/>encoder</a>. + + <li><p>Let <var>controller</var> be <var>enc</var>'s <a + for=TextEncoder>transform</a>.\[[transformStreamController]]. + + <li><p>While true, run these steps: Since this `<li>` has multiple children the `<p>` needs to go on a newline indented 1 (and the same for the following `<ol>`). > + <li><p>Call <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>, + <var>output</var>). + + <li><p>Return a new promise resolved with undefined. + </ol> + + <li><p>Let <var>result</var> be the result of executing the <a>convert code unit to scalar + value</a> algorithm with <var>enc</var>, <var>token</var> and <var>input</var>. + + <li><p>If <var>result</var> is not <a>continue</a>, <a>process</a> <var>result</var> for + <var>encoder</var>, <var>input</var>, <var>output</var>. + + </ol> +</ol> + + This newline can go. Or if you think that they should be separated even visually use `<hr>`. > + <li><p>Let <var>result</var> be the result of executing the <a>convert code unit to scalar + value</a> algorithm with <var>enc</var>, <var>token</var> and <var>input</var>. + + <li><p>If <var>result</var> is not <a>continue</a>, <a>process</a> <var>result</var> for + <var>encoder</var>, <var>input</var>, <var>output</var>. + + </ol> +</ol> + + +<p>The <dfn id=concept-te-convert-code-unit-to-scalar-value>convert code unit to scalar value</dfn> +algorithm, given a {{TextEncoder}} <var>enc</var>, <var>token</var> and <var>input</var> stream, +runs these steps: + +<ol> + <li><p>If <var>enc</var>'s <a>pending high surrogate</a> is set, run these steps: `<li>` with multiple children again. > + + <ol> + <li><p>Let <var>high surrogate</var> be <var>enc</var>'s <a>pending high surrogate</a>. + + <li><p>Unset <var>enc</var>'s <a>pending high surrogate</a>. + + <li><p>If <var>token</var> is in the range U+DC00 to U+DFFF, inclusive, return a code point whose + value is 0x10000 + ((<var>high surrogate</var> − 0xD800) << 10) + (<var>token</var> + − 0xDC00). + + <li><p><a>Prepend</a> <var>token</var> to <var>input</var>. + + <li><p>Return U+FFFD. + </ol> + + <li><p>If <var>token</var> is in the range U+D800 to U+DBFF, inclusive, set <a>pending high No newline inside a phrasing element. > + <li><p>Return U+FFFD. + </ol> + + <li><p>If <var>token</var> is in the range U+D800 to U+DBFF, inclusive, set <a>pending high + surrogate</a> to <var>token</var> and return <a>continue</a>. + + <li><p>If <var>token</var> is in the range U+DC00 to U+DFFF, inclusive, return U+FFFD. + + <li><p>Return <var>token</var>. +</ol> + +<p class=note>This is equivalent to the <a spec=webidl>convert a DOMString to a sequence of Unicode +scalar values</a> algorithm from [[WEBIDL]], but allows for surrogate pairs that are split between +strings. + + No second newline here (or potentially `<hr>`). > + + <li><p>If <var>token</var> is in the range U+DC00 to U+DFFF, inclusive, return U+FFFD. + + <li><p>Return <var>token</var>. +</ol> + +<p class=note>This is equivalent to the <a spec=webidl>convert a DOMString to a sequence of Unicode +scalar values</a> algorithm from [[WEBIDL]], but allows for surrogate pairs that are split between +strings. + + +<p>The <dfn id=concept-te-encode-and-flush>encode and flush</dfn> algorithm, given a +{{TextEncoder}} <var>enc</var>, runs these steps: + +<ol> + <li><p>If <var>enc</var>'s <a>pending high surrogate</a> is set, run these steps: `<li>` with multiple children again. > @@ -1241,6 +1314,88 @@ method, when invoked, must run these steps: </ol> </ol> Having an `<hr>` here would be good I think, to separate it a bit from the member definitions. > + <li><p><a>Prepend</a> <var>token</var> to <var>input</var>. + + <li><p>Return U+FFFD. + </ol> + + <li><p>If <var>token</var> is in the range U+D800 to U+DBFF, inclusive, set <a>pending high + surrogate</a> to <var>token</var> and return <a>continue</a>. + + <li><p>If <var>token</var> is in the range U+DC00 to U+DFFF, inclusive, return U+FFFD. + + <li><p>Return <var>token</var>. +</ol> + +<p class=note>This is equivalent to the <a spec=webidl>convert a DOMString to a sequence of Unicode +scalar values</a> algorithm from [[WEBIDL]], but allows for surrogate pairs that are split between +strings. I was thinking, it would be better to reference https://infra.spec.whatwg.org/#javascript-string-convert here since eventually we'll update IDL to use that too and then this would need to be updated. > + + </ol> +</ol> + + +<p>The <dfn id=concept-te-convert-code-unit-to-scalar-value>convert code unit to scalar value</dfn> +algorithm, given a {{TextEncoder}} <var>enc</var>, <var>token</var> and <var>input</var> stream, +runs these steps: + +<ol> + <li><p>If <var>enc</var>'s <a>pending high surrogate</a> is set, run these steps: + + <ol> + <li><p>Let <var>high surrogate</var> be <var>enc</var>'s <a>pending high surrogate</a>. + + <li><p>Unset <var>enc</var>'s <a>pending high surrogate</a>. I think it would be slightly clearer if we instead made pending high surrogate's type "null or a code unit" instead of the current mixture of flag and code unit. So check for non-null and set it to null when done. > @@ -1314,6 +1524,113 @@ must run these steps: </ol> </ol> An `<hr>` would be good here too. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/pull/127#pullrequestreview-91102477
Received on Wednesday, 24 January 2018 09:33:27 UTC