Re: [whatwg/url] Restructure URL rendering section and add additional guidance (#434)

estark37 commented on this pull request.



> @@ -2476,39 +2476,82 @@ background information. [[!HTML]]
 <h3 id=url-rendering>URL rendering</h3>
 <!-- See https://www.w3.org/Bugs/Public/show_bug.cgi?id=27641 for context -->
 
-<p>A <a for=/>URL</a> should be rendered in its <a lt="URL serializer">serialized</a>
-form, with these modifications:
+<p>A <a for=/>URL</a> should be rendered in its <a lt="URL serializer">serialized</a> form, with
+modifications described below, when the primary purpose of displaying a URL is to have the user make
+a security decision. For example, users are expected to make trust decisions based on a URL rendered

I'm not sure if you were suggesting literally "security or trust decision", or asking whether it should be "security" or "trust" -- but I went with the former as that makes the most sense to me. I don't think it should just be "assist the user in making decisions" because some of these guidelines don't apply to non-security situations IMO. (For example, I wouldn't advise a browser to display the registrable domain only in a bookmarks list.)

> + users to distinguish between the host and other parts of the URL such as the
+ <a for=url>path</a>. Browsers may consider simplifying the host further to draw attention to the
+ <a for=host>registrable domain</a>. For example, browsers may omit a leading <code>www</code> or
+ <code>m</code> domain label to simplify the host, or display the registrable domain only to remove
+ spoofing opportunities posted by subdomains (e.g., <code>https://examplecorp.attacker.com/</code>).
+
+ <li><p>Browsers should not render a <a for=/>URL</a>'s <a for=url>username</a> and <a
+ for=url>password</a>, as they can be mistaken for a <a for=/>URL</a>'s <a for=url>host</a> (as in,
+ e.g., <code>https://examplecorp.com@attacker.example/</code>).
+
+ <li><p>Browsers may render a URL without its <a for=url>scheme</a> if the display surface only ever
+ permits a single scheme (such as a browser feature that omits <code>https://</code> because it is
+ only enabled for secure origins). Otherwise, the scheme may be replaced or supplemented with a
+ human-readable string (e.g., "Not secure"), a security indicator icon, or both.
+
+ <li><p>As described in <a>URL serializer</a>, browsers should not serialize null ports.

Done

> -<p>For the purposes of bidirectional text it should be rendered as if it were in a
-left-to-right embedding. [[!BIDI]]
+<h4 id=url-rendering-elision>Elision</h4>
+
+<p>In a space-constrained display, URLs should be elided carefully to avoid misleading the user when
+making a security decision:
+
+<ul>
+ <li><p>Browsers should ensure that at least the <a for=host>registrable domain</a> can be shown
+ when the URL is rendered (to avoid showing, e.g., <code>...examplecorp.com</code> when loading
+ <code>https://not-really-examplecorp.com/</code>).
+
+ <li><p>When the full <a for=url>host</a> cannot be rendered, browsers should elide domain labels
+ starting from the lowest-level domain label. For example, <code>examplecorp.com.evil.com</code>
+ should be elided as <code>...com.evil.com</code>, not <code>examplecorp.com...</code>. (Note that
+ bidirectional text means that the lowest-level label may not appear at the left.)

Done

>  
+ <p class="note no-backref">Note that non-ASCII characters can be used in <a

Done

>  
+ <p class="note no-backref">Note that non-ASCII characters can be used in <a

Done

>  
+ <p class="note no-backref">Note that non-ASCII characters can be used in <a
+ href="http://unicode.org/faq/idn.html#26">homograph</a> spoofing attacks. Consider detecting <a
+ href="http://www.unicode.org/reports/tr39/#Confusable_Detection">confusable characters</a> or

Done. Chrome is currently working on a warning for some confusable situations instead of just punycode, so I'm on board with "and".

> +
+ <li><p>URLs are particularly prone to confusion between host and path when they contain
+ bidirectional text, so in this case it is particularly advisable to only render a URL’s <a
+ for=url>host</a>. For readability, other parts of the <a for=/>URL</a>, if rendered, should have
+ their sequences of <a>percent-encoded bytes</a> replaced with code points resulting from <a>percent
+ decoding</a> those sequences converted to bytes, unless that renders those sequences
+ invisible. Browsers may choose to not decode certain sequences that present spoofing risks (e.g.,
+ the Unicode LOCK character U+1F512).
+
+ <li><p>Browsers should render bidirectional text as if it were in a left-to-right embedding. [[!BIDI]]
+
+ <p class="note no-backref">Unfortunately, as rendered <a for=/>URLs</a> are strings and can appear
+ anywhere, a specific bidirectional algorithm for rendered <a for=/>URLs</a> would not see wide
+ adoption. Bidirectional text interacts with the parts of a <a for=/>URL</a> in ways that can cause
+ the rendering to be different from the model. Users of bidirectional languages are thus cautioned
+ that this is to be expected, particularly in plain text environments.

Rephrased. This whole paragraph is woefully unsatisfying, but I think this is the best we can do for now.

>  
+ <p class="note no-backref">Note that non-ASCII characters can be used in <a
+ href="http://unicode.org/faq/idn.html#26">homograph</a> spoofing attacks. Consider detecting <a
+ href="http://www.unicode.org/reports/tr39/#Confusable_Detection">confusable characters</a> or

Thank you! Done.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/pull/434#discussion_r269405267

Received on Wednesday, 27 March 2019 05:02:19 UTC