Re: [whatwg/url] Formatting conventions (#47)

I think Infra should cover the formatting conventions for code points. In fact, it already does, saying

> The code point rendered as 🤔 is represented as U+1F914.
>
> When referring to that code point, we might instead say "U+1F914 THINKING FACE (🤔)", instead of just "U+1F914", to provide extra context.

For bytes, it only gives a notation for byte sequences, otherwise expecting you to use the 0x20 form always:

> A byte is a sequence of eight bits, represented as a double-digit hexadecimal number in the range 0x00 to 0xFF, inclusive. 

It sounds like we have a few proposals to extend this. Let me summarize what I see as the most straightforward proposals, mixing in a few of my own by analogy:

- For code points:
  - Allow also the form "U+1F914 (🤔)". Seems fine.
  - Allow the form "🤔". I'm not a fan; this is ambiguous with a length-1 string containing the single code point.
  - For control characters, allow the form "U+0020 (␠)". I'm not a huge fan, since ␠ is actually U+2420.
  - Allow control characters inside strings in the form "abc␠def". Maybe this is OK, with appropriate verbiage in Infra explaining how to reinterpret certain characters in strings.
- For bytes
  - Allow the analogous form "0x2F SOLIDUS (%)". Not sure this is a good idea since aren't those names (like SOLIDUS) about code points, not bytes? 
  - Allow also the form "0x2F (%)". Probably fine.
  - Allow also the form \`%\`. Again ambiguous with the length-1 byte sequence...
  - Allow also the form "0x20 (␠)". Maybe less of a problem than for code points, since there is no single byte corresponding to U+2420?
  - Allow control characters inside byte sequences in the form \`abc␠def\`. Probably OK, with appropriate verbiage in Infra.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/47#issuecomment-270169507

Received on Tuesday, 3 January 2017 17:22:50 UTC