Re: [whatwg/url] Restructure URL rendering section and add additional guidance (#434)

jyasskin commented on this pull request.



> @@ -2476,39 +2476,83 @@ background information. [[!HTML]]
 <h3 id=url-rendering>URL rendering</h3>
 <!-- See https://www.w3.org/Bugs/Public/show_bug.cgi?id=27641 for context -->
 
-<p>A <a for=/>URL</a> should be rendered in its <a lt="URL serializer">serialized</a>
-form, with these modifications:
+<p>A <a for=/>URL</a> should be rendered in its <a lt="URL serializer">serialized</a> form, with
+modifications described below when the primary purpose of displaying a URL is to have the user make
+a security decision (e.g., users are expected to make trust decisions based on a URL rendered in the

The current phrasing looks like you're giving an example of a particular security decision, but the example itself is a description of a purpose, so it might be a bit easier to read as:

```suggestion
a security decision.  For example, users are expected to make trust decisions based on a URL rendered in the
```

>  
-<ul class=brief>
- <li><p>A <a for=/>URL</a>'s <a for=url>username</a> and <a for=url>password</a> should
- not be rendered as they can be mistaken for a <a for=/>URL</a>'s <a for=url>host</a>.
- E.g., consider <code>https://examplecorp.com@attacker.example/</code>.
+<h4 id=url-rendering-simplification>Simplify non-human-readable or irrelevant components</h4>
+
+<p>Remove components that may provide opportunities for spoofing or distract from security-relevant
+information:
+
+<ul>
+ <li><p>Browsers are encouraged to only render a URL’s <a for=url>host</a> in places where it is

The sentence structure here is different from the other bullets, which use "URLs should be rendered" instead of "Browsers should render URLs". You should keep them parallel.

Also, I think "are encouraged to" can be replaced by "should"?

>  </ul>
 
-<p>For the purposes of bidirectional text it should be rendered as if it were in a
-left-to-right embedding. [[!BIDI]]
+<h4 id=url-rendering-i18n>Internationalization and special characters</h4>
+
+<p>International domain names (IDNs), special characters, and bidirectional text should be handled
+with care to prevent spoofing:
+
+<ul>

You're missing the `</ul>` for this.

>  
-<ul class=brief>
- <li><p>A <a for=/>URL</a>'s <a for=url>username</a> and <a for=url>password</a> should
- not be rendered as they can be mistaken for a <a for=/>URL</a>'s <a for=url>host</a>.
- E.g., consider <code>https://examplecorp.com@attacker.example/</code>.
+<h4 id=url-rendering-simplification>Simplify non-human-readable or irrelevant components</h4>
+
+<p>Remove components that may provide opportunities for spoofing or distract from security-relevant
+information:
+
+<ul>
+ <li><p>Browsers are encouraged to only render a URL’s <a for=url>host</a> in places where it is

"only render a URL's host in places where" could be interpreted as "don't render the host in other places" or "don't render other parts of the URL in these places". I think you mean the second, which could be achieved with

```suggestion
 <li><p>Browsers are encouraged to render only a URL’s <a for=url>host</a> in places where it is
```

> +
+<p>Remove components that may provide opportunities for spoofing or distract from security-relevant
+information:
+
+<ul>
+ <li><p>Browsers are encouraged to only render a URL’s <a for=url>host</a> in places where it is
+ important for users to distinguish between the host and other parts of the URL such as the <a
+ for=url>path</a>. Browsers may further consider rendering only the URL’s host's <a
+ for=host>registrable domain</a> to remove spoofing opportunities posed by subdomains (e.g.,
+ <code>https://examplecorp.attacker.com/</code>).
+
+ <li><p>A <a for=/>URL</a>'s <a for=url>username</a> and <a for=url>password</a> should not be
+ rendered as they can be mistaken for a <a for=/>URL</a>'s <a for=url>host</a> (as in, e.g.,
+ <code>https://examplecorp.com@attacker.example/</code>).
+
+ <li><p>A URL can be rendered without its <a for=url>scheme</a> if the display surface only ever

"can" implies that this guidance isn't normative, but it uses the normative "should" in most other places. Use "may" instead? https://infra.spec.whatwg.org/#conformance


>  
- <li><p>Other parts of the <a for=/>URL</a> should have their sequences of
- <a>percent-encoded bytes</a> replaced with code points resulting from
- <a>percent decoding</a> those sequences converted to bytes, unless that renders those
- sequences invisible.
+<p>In a space-constrained display, URLs should be elided carefully to avoid misleading the user when
+making a security decision:
+
+<ul>
+ <li><p>Ensure that at least the <a for=host>registrable domain</a> can be shown when the URL is
+ rendered (to avoid showing, e.g., <code>...examplecorp.com</code> when loading
+ <code>https://not-really-examplecorp.com/</code>).
+
+ <li><p>When the full <a for=url>host</a> cannot be rendered, elide domain labels starting from the
+ front. For example, <code>examplecorp.com.evil.com</code> should be elided as

Nit: Can you use a word that's more precise than "front"? The example helps a lot, but the note about bidirectional text made me wonder about cases where the least-significant parts of the domain aren't first in my normal reading direction...

>  
-<p>Due to the confusion that can arise between a <a for=/>URL</a>'s <a for=url>host</a>
-and <a for=url>path</a> with bidirectional text, browsers are encouraged to only render a
-<a for=/>URL</a>'s <a for=url>host</a> in places where it is important for users to
-distinguish between the two. E.g., users are expected to make trust decisions based on a
-<a for=/>URL</a>'s <a for=url>host</a> rendered in the address bar.
+ <li><p>URLs are particularly prone to confusion between host and path when they contain
+ bidirectional text, so in this case it is particularly advisable to only render a URL’s <a
+ for=url>host</a>. Other parts of the <a for=/>URL</a>, if rendered, should have their sequences of
+ <a>percent-encoded bytes</a> replaced with code points resulting from <a>percent decoding</a> those

I probably haven't thought about this enough, but I've lost track of how replacing %-encoded bytes with their decodings help disambiguate between the host and path. An example might help?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/pull/434#pullrequestreview-216904401

Received on Wednesday, 20 March 2019 20:04:35 UTC