Re: [whatwg/encoding] Add get an encoder and encode or fail for URLs (#238)

@hsivonen commented on this pull request.



> +
+<ol>
+ <li><p>Let <var>potentialError</var> be the result of <a>running</a> <var>encoder</var> with
+ <var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>".
+
+ <li><p>If <var>potentialError</var> is an <a>error</a>, then return <a>error</a>'s
+ <a>code point</a>'s <a for="code point">value</a>.
+
+ <li><p>Return null.
+</ol>
+
+<div class=note>
+ <p>This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> alive as
+ the <a>ISO-2022-JP encoder</a> can be in two different states when returning an <a>error</a>. That
+ also means that if the caller emits bytes to encode the error in some way, these have to be in the
+ range 0x00 to 0x7F, inclusive, excluding 0x0E, 0x0F, 0x1B, 0x5C, and 0x7E. [[URL]]

I guess this could even continue:

...or pass the replacement syntax through the encoder.

Also, it might make sense to add ("e.g. `\u2603`").

Hence:

Notably, if upon returning an error the ISO-2022-JP encoder is in the Roman state, unlike in all other cases, if the caller inserts the byte 0x5C into the output, it will not decode into U+005C (\) when the byte stream is decoded. For this reason, applications using the non-UTF-8 encoder facilities of this specification for purposes other than the intended ones (form submission and URL parsing in the Web Platform) ought to take care to prevent the use of the ISO-2022-JP encoder in combination with replacement schemes, such as those of JavaScript and CSS, that use U+005C (\) as part of the replacement syntax (e.g. `\u2603`) or make sure to pass the replacement syntax through the encoder (unlike in the URL parsing case).

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/pull/238#discussion_r509308188

Received on Wednesday, 21 October 2020 13:55:53 UTC