Re: [whatwg/encoding] Editorial: revamp the way we deal with code points and bytes (#247) from Addison Phillips on 2020-11-02 (public-webapps-github@w3.org from November 2020)

From: Addison Phillips <notifications@github.com>
Date: Mon, 02 Nov 2020 11:29:20 -0800
To: whatwg/encoding <encoding@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/encoding/pull/247/review/521920184@github.com>

@aphillips commented on this pull request.

I like this generally as a direction, but made some "food for thought" comments below.

> @@ -1915,7 +1916,7 @@ constructor steps are:
  <p class=note>{{DOMString}}, as well as an <a for=/>I/O queue</a> of code units rather than scalar
  values, are used here so that a surrogate pair that is split between chunks can be reassembled into
  the appropriate scalar value. The behavior is otherwise identical to {{USVString}}. In particular,
- lone surrogates will be replaced with U+FFFD.
+ lone surrogates will be replaced with U+FFFD (�).

In [Charmod ](http://aphillips.github.io/charmod-norm/#charmod_n11n_combining_marks) we often followed the convention:

> � [U+FFFD REPLACEMENT CHARACTER]

(with the `[U+xxxx character name]` part styled distinctly). I say "often" because I willfully ignored the convention whenever it reduced clarity, particularly with long sequences used in this or that example. For examples this like, you might consider something similar, since it makes the text unambiguous?

OTOH, I find this pretty clear and am not sure that the charmod style adds that much. I like quoting the character like this when it's printable.



>  
- <li><p>If <var>byte</var> is an <a>ASCII byte</a>, return
- a code point whose value is <var>byte</var>.
+ <li><p>Let <var>byteValue</var> be <var>byte</var>'s <a for=byte>value</a>.

is `byteValue` really needed vs. just saying things like:

> If byte is an ASCII byte, then return a code point whose value is byte's value.

I realize that "code point's value" is a different integer type than "byte's value", but we mean the number in any case.

>  
- <li><p>Return a code point whose value is 0xF780 + <var>byte</var> &minus; 0x80.
+ <li><p>If <var>byte</var> is an <a>ASCII byte</a>, then return a <a>code point</a> whose
+ <a for="code point">value</a> is <var>byteValue</var>.
+
+ <li><p>Return a <a>code point</a> whose <a for="code point">value</a> is
+ 0xF780 + <var>byteValue</var> &minus; 0x80.

I see the problem. You don't want prose here. But can't we just say `0xF780 + byte - 0x80`?

Is there a reason I'm not seeing for why we don't just make the number `0xF700`? Is the reason to emphasize that we're trying to get to/from bytes >= 0x80?

>  
- <li><p>If <var>code point</var> is in the range U+F780 to U+F7FF, inclusive, return
- a byte whose value is <var>code point</var> &minus; 0xF780 + 0x80.
+ <li><p>If <var>codePointValue</var> is in the range 0xF780 to 0xF7FF, inclusive, then return a
+ <a>byte</a> whose <a for=byte>value</a> is <var>codePointValue</var> &minus; 0xF780 + 0x80.

usw. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/pull/247#pullrequestreview-521920184

Received on Monday, 2 November 2020 19:29:33 UTC