Re: [whatwg/url] Restructure URL rendering section and add additional guidance (#434)

annevk commented on this pull request.

I'm really happy with this. Big improvement over the status quo.

I left a lot of nits I'm happy to push as a fixup commit, but I left them as comments for now. And a couple final questions.

> @@ -2476,39 +2476,82 @@ background information. [[!HTML]]
 <h3 id=url-rendering>URL rendering</h3>
 <!-- See https://www.w3.org/Bugs/Public/show_bug.cgi?id=27641 for context -->
 
-<p>A <a for=/>URL</a> should be rendered in its <a lt="URL serializer">serialized</a>
-form, with these modifications:
+<p>A <a for=/>URL</a> should be rendered in its <a lt="URL serializer">serialized</a> form, with
+modifications described below, when the primary purpose of displaying a URL is to have the user make
+a security decision. For example, users are expected to make trust decisions based on a URL rendered

Should we call it a security or trust decision? Or maybe instead we could say "of displaying a URL is to assist the user in making decisions".

> + users to distinguish between the host and other parts of the URL such as the
+ <a for=url>path</a>. Browsers may consider simplifying the host further to draw attention to the
+ <a for=host>registrable domain</a>. For example, browsers may omit a leading <code>www</code> or
+ <code>m</code> domain label to simplify the host, or display the registrable domain only to remove
+ spoofing opportunities posted by subdomains (e.g., <code>https://examplecorp.attacker.com/</code>).
+
+ <li><p>Browsers should not render a <a for=/>URL</a>'s <a for=url>username</a> and <a
+ for=url>password</a>, as they can be mistaken for a <a for=/>URL</a>'s <a for=url>host</a> (as in,
+ e.g., <code>https://examplecorp.com@attacker.example/</code>).
+
+ <li><p>Browsers may render a URL without its <a for=url>scheme</a> if the display surface only ever
+ permits a single scheme (such as a browser feature that omits <code>https://</code> because it is
+ only enabled for secure origins). Otherwise, the scheme may be replaced or supplemented with a
+ human-readable string (e.g., "Not secure"), a security indicator icon, or both.
+
+ <li><p>As described in <a>URL serializer</a>, browsers should not serialize null ports.

Having rephrased it, do you still think this is worth keeping? It seems redundant to me at this point.

> -<p>For the purposes of bidirectional text it should be rendered as if it were in a
-left-to-right embedding. [[!BIDI]]
+<h4 id=url-rendering-elision>Elision</h4>
+
+<p>In a space-constrained display, URLs should be elided carefully to avoid misleading the user when
+making a security decision:
+
+<ul>
+ <li><p>Browsers should ensure that at least the <a for=host>registrable domain</a> can be shown
+ when the URL is rendered (to avoid showing, e.g., <code>...examplecorp.com</code> when loading
+ <code>https://not-really-examplecorp.com/</code>).
+
+ <li><p>When the full <a for=url>host</a> cannot be rendered, browsers should elide domain labels
+ starting from the lowest-level domain label. For example, <code>examplecorp.com.evil.com</code>
+ should be elided as <code>...com.evil.com</code>, not <code>examplecorp.com...</code>. (Note that
+ bidirectional text means that the lowest-level label may not appear at the left.)

I'm not a native speaker, but "on the left" sounds more natural to me. Or "at the left side" perhaps.

>  
-<p>Due to the confusion that can arise between a <a for=/>URL</a>'s <a for=url>host</a>
-and <a for=url>path</a> with bidirectional text, browsers are encouraged to only render a
-<a for=/>URL</a>'s <a for=url>host</a> in places where it is important for users to
-distinguish between the two. E.g., users are expected to make trust decisions based on a
-<a for=/>URL</a>'s <a for=url>host</a> rendered in the address bar.
+<p>International domain names (IDNs), special characters, and bidirectional text should be handled

Editorial (I can fix this before merging): "Internationalized domain name" seems to be the canonical expansion of this abbreviation.

>  
+<ul>
+ <li><p>Browsers should render a <a for=/>URL</a>'s <a for=url>host</a> using

Editorial (I can fix this before merging): as this `<li>` contains multiple elements those need to be on their own lines.

>  
+ <p class="note no-backref">Note that non-ASCII characters can be used in <a
+ href="http://unicode.org/faq/idn.html#26">homograph</a> spoofing attacks. Consider detecting <a
+ href="http://www.unicode.org/reports/tr39/#Confusable_Detection">confusable characters</a> or
+ warning when they are in use.
+
+ <li><p>URLs are particularly prone to confusion between host and path when they contain
+ bidirectional text, so in this case it is particularly advisable to only render a URL’s <a
+ for=url>host</a>. For readability, other parts of the <a for=/>URL</a>, if rendered, should have
+ their sequences of <a>percent-encoded bytes</a> replaced with code points resulting from <a>percent
+ decoding</a> those sequences converted to bytes, unless that renders those sequences
+ invisible. Browsers may choose to not decode certain sequences that present spoofing risks (e.g.,
+ the Unicode LOCK character U+1F512).
+
+ <li><p>Browsers should render bidirectional text as if it were in a left-to-right embedding. [[!BIDI]]

Editorial (I can fix this before merging): as this `<li>` contains multiple elements those need to be on their own lines.

>  
+ <p class="note no-backref">Note that non-ASCII characters can be used in <a
+ href="http://unicode.org/faq/idn.html#26">homograph</a> spoofing attacks. Consider detecting <a
+ href="http://www.unicode.org/reports/tr39/#Confusable_Detection">confusable characters</a> or
+ warning when they are in use.
+
+ <li><p>URLs are particularly prone to confusion between host and path when they contain
+ bidirectional text, so in this case it is particularly advisable to only render a URL’s <a

Editorial (I can fix this before merging): we use normal single quotes in the source.

>  
+ <p class="note no-backref">Note that non-ASCII characters can be used in <a

Editorial (I can fix this before merging): no newlines inside inline elements. (Happens a few times.)

> +
+ <li><p>URLs are particularly prone to confusion between host and path when they contain
+ bidirectional text, so in this case it is particularly advisable to only render a URL’s <a
+ for=url>host</a>. For readability, other parts of the <a for=/>URL</a>, if rendered, should have
+ their sequences of <a>percent-encoded bytes</a> replaced with code points resulting from <a>percent
+ decoding</a> those sequences converted to bytes, unless that renders those sequences
+ invisible. Browsers may choose to not decode certain sequences that present spoofing risks (e.g.,
+ the Unicode LOCK character U+1F512).
+
+ <li><p>Browsers should render bidirectional text as if it were in a left-to-right embedding. [[!BIDI]]
+
+ <p class="note no-backref">Unfortunately, as rendered <a for=/>URLs</a> are strings and can appear
+ anywhere, a specific bidirectional algorithm for rendered <a for=/>URLs</a> would not see wide
+ adoption. Bidirectional text interacts with the parts of a <a for=/>URL</a> in ways that can cause
+ the rendering to be different from the model. Users of bidirectional languages are thus cautioned
+ that this is to be expected, particularly in plain text environments.

I wonder if we should rephrase this a bit as I doubt users read this document or take advice from it. Perhaps state it more as a matter of fact, that users will come to expect this or are expecting this.

>  
+ <p class="note no-backref">Note that non-ASCII characters can be used in <a
+ href="http://unicode.org/faq/idn.html#26">homograph</a> spoofing attacks. Consider detecting <a
+ href="http://www.unicode.org/reports/tr39/#Confusable_Detection">confusable characters</a> or
+ warning when they are in use.
+
+ <li><p>URLs are particularly prone to confusion between host and path when they contain
+ bidirectional text, so in this case it is particularly advisable to only render a URL’s <a
+ for=url>host</a>. For readability, other parts of the <a for=/>URL</a>, if rendered, should have
+ their sequences of <a>percent-encoded bytes</a> replaced with code points resulting from <a>percent
+ decoding</a> those sequences converted to bytes, unless that renders those sequences
+ invisible. Browsers may choose to not decode certain sequences that present spoofing risks (e.g.,
+ the Unicode LOCK character U+1F512).

Editorial (I can fix this before merging): write this as "U+1F512 (🔒)" per Infra.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/pull/434#pullrequestreview-218074118

Received on Sunday, 24 March 2019 10:17:04 UTC