- From: Anne van Kesteren <notifications@github.com>
- Date: Sun, 24 Mar 2019 03:16:40 -0700
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/pull/434/review/218074118@github.com>
annevk commented on this pull request. I'm really happy with this. Big improvement over the status quo. I left a lot of nits I'm happy to push as a fixup commit, but I left them as comments for now. And a couple final questions. > @@ -2476,39 +2476,82 @@ background information. [[!HTML]] <h3 id=url-rendering>URL rendering</h3> <!-- See https://www.w3.org/Bugs/Public/show_bug.cgi?id=27641 for context --> -<p>A <a for=/>URL</a> should be rendered in its <a lt="URL serializer">serialized</a> -form, with these modifications: +<p>A <a for=/>URL</a> should be rendered in its <a lt="URL serializer">serialized</a> form, with +modifications described below, when the primary purpose of displaying a URL is to have the user make +a security decision. For example, users are expected to make trust decisions based on a URL rendered Should we call it a security or trust decision? Or maybe instead we could say "of displaying a URL is to assist the user in making decisions". > + users to distinguish between the host and other parts of the URL such as the + <a for=url>path</a>. Browsers may consider simplifying the host further to draw attention to the + <a for=host>registrable domain</a>. For example, browsers may omit a leading <code>www</code> or + <code>m</code> domain label to simplify the host, or display the registrable domain only to remove + spoofing opportunities posted by subdomains (e.g., <code>https://examplecorp.attacker.com/</code>). + + <li><p>Browsers should not render a <a for=/>URL</a>'s <a for=url>username</a> and <a + for=url>password</a>, as they can be mistaken for a <a for=/>URL</a>'s <a for=url>host</a> (as in, + e.g., <code>https://examplecorp.com@attacker.example/</code>). + + <li><p>Browsers may render a URL without its <a for=url>scheme</a> if the display surface only ever + permits a single scheme (such as a browser feature that omits <code>https://</code> because it is + only enabled for secure origins). Otherwise, the scheme may be replaced or supplemented with a + human-readable string (e.g., "Not secure"), a security indicator icon, or both. + + <li><p>As described in <a>URL serializer</a>, browsers should not serialize null ports. Having rephrased it, do you still think this is worth keeping? It seems redundant to me at this point. > -<p>For the purposes of bidirectional text it should be rendered as if it were in a -left-to-right embedding. [[!BIDI]] +<h4 id=url-rendering-elision>Elision</h4> + +<p>In a space-constrained display, URLs should be elided carefully to avoid misleading the user when +making a security decision: + +<ul> + <li><p>Browsers should ensure that at least the <a for=host>registrable domain</a> can be shown + when the URL is rendered (to avoid showing, e.g., <code>...examplecorp.com</code> when loading + <code>https://not-really-examplecorp.com/</code>). + + <li><p>When the full <a for=url>host</a> cannot be rendered, browsers should elide domain labels + starting from the lowest-level domain label. For example, <code>examplecorp.com.evil.com</code> + should be elided as <code>...com.evil.com</code>, not <code>examplecorp.com...</code>. (Note that + bidirectional text means that the lowest-level label may not appear at the left.) I'm not a native speaker, but "on the left" sounds more natural to me. Or "at the left side" perhaps. > -<p>Due to the confusion that can arise between a <a for=/>URL</a>'s <a for=url>host</a> -and <a for=url>path</a> with bidirectional text, browsers are encouraged to only render a -<a for=/>URL</a>'s <a for=url>host</a> in places where it is important for users to -distinguish between the two. E.g., users are expected to make trust decisions based on a -<a for=/>URL</a>'s <a for=url>host</a> rendered in the address bar. +<p>International domain names (IDNs), special characters, and bidirectional text should be handled Editorial (I can fix this before merging): "Internationalized domain name" seems to be the canonical expansion of this abbreviation. > +<ul> + <li><p>Browsers should render a <a for=/>URL</a>'s <a for=url>host</a> using Editorial (I can fix this before merging): as this `<li>` contains multiple elements those need to be on their own lines. > + <p class="note no-backref">Note that non-ASCII characters can be used in <a + href="http://unicode.org/faq/idn.html#26">homograph</a> spoofing attacks. Consider detecting <a + href="http://www.unicode.org/reports/tr39/#Confusable_Detection">confusable characters</a> or + warning when they are in use. + + <li><p>URLs are particularly prone to confusion between host and path when they contain + bidirectional text, so in this case it is particularly advisable to only render a URL’s <a + for=url>host</a>. For readability, other parts of the <a for=/>URL</a>, if rendered, should have + their sequences of <a>percent-encoded bytes</a> replaced with code points resulting from <a>percent + decoding</a> those sequences converted to bytes, unless that renders those sequences + invisible. Browsers may choose to not decode certain sequences that present spoofing risks (e.g., + the Unicode LOCK character U+1F512). + + <li><p>Browsers should render bidirectional text as if it were in a left-to-right embedding. [[!BIDI]] Editorial (I can fix this before merging): as this `<li>` contains multiple elements those need to be on their own lines. > + <p class="note no-backref">Note that non-ASCII characters can be used in <a + href="http://unicode.org/faq/idn.html#26">homograph</a> spoofing attacks. Consider detecting <a + href="http://www.unicode.org/reports/tr39/#Confusable_Detection">confusable characters</a> or + warning when they are in use. + + <li><p>URLs are particularly prone to confusion between host and path when they contain + bidirectional text, so in this case it is particularly advisable to only render a URL’s <a Editorial (I can fix this before merging): we use normal single quotes in the source. > + <p class="note no-backref">Note that non-ASCII characters can be used in <a Editorial (I can fix this before merging): no newlines inside inline elements. (Happens a few times.) > + + <li><p>URLs are particularly prone to confusion between host and path when they contain + bidirectional text, so in this case it is particularly advisable to only render a URL’s <a + for=url>host</a>. For readability, other parts of the <a for=/>URL</a>, if rendered, should have + their sequences of <a>percent-encoded bytes</a> replaced with code points resulting from <a>percent + decoding</a> those sequences converted to bytes, unless that renders those sequences + invisible. Browsers may choose to not decode certain sequences that present spoofing risks (e.g., + the Unicode LOCK character U+1F512). + + <li><p>Browsers should render bidirectional text as if it were in a left-to-right embedding. [[!BIDI]] + + <p class="note no-backref">Unfortunately, as rendered <a for=/>URLs</a> are strings and can appear + anywhere, a specific bidirectional algorithm for rendered <a for=/>URLs</a> would not see wide + adoption. Bidirectional text interacts with the parts of a <a for=/>URL</a> in ways that can cause + the rendering to be different from the model. Users of bidirectional languages are thus cautioned + that this is to be expected, particularly in plain text environments. I wonder if we should rephrase this a bit as I doubt users read this document or take advice from it. Perhaps state it more as a matter of fact, that users will come to expect this or are expecting this. > + <p class="note no-backref">Note that non-ASCII characters can be used in <a + href="http://unicode.org/faq/idn.html#26">homograph</a> spoofing attacks. Consider detecting <a + href="http://www.unicode.org/reports/tr39/#Confusable_Detection">confusable characters</a> or + warning when they are in use. + + <li><p>URLs are particularly prone to confusion between host and path when they contain + bidirectional text, so in this case it is particularly advisable to only render a URL’s <a + for=url>host</a>. For readability, other parts of the <a for=/>URL</a>, if rendered, should have + their sequences of <a>percent-encoded bytes</a> replaced with code points resulting from <a>percent + decoding</a> those sequences converted to bytes, unless that renders those sequences + invisible. Browsers may choose to not decode certain sequences that present spoofing risks (e.g., + the Unicode LOCK character U+1F512). Editorial (I can fix this before merging): write this as "U+1F512 (🔒)" per Infra. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/whatwg/url/pull/434#pullrequestreview-218074118
Received on Sunday, 24 March 2019 10:17:04 UTC