- From: Outvi V <notifications@github.com>
- Date: Sat, 02 Jan 2021 06:33:12 -0800
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/issues/568@github.com>
I'm creating this issue as a comment of whatwg/url [4.8.1. Simplify non-human-readable or irrelevant components](https://url.spec.whatwg.org/#url-rendering-simplification). The related part is introduced in <https://github.com/whatwg/url/commit/8809598ddfd1d935432c8a0cad53f13d70e24bc6>, which is a part of "Guidelines for URL Display" added [here](https://chromium-review.googlesource.com/c/chromium/src/+/1402344). ### Quote This issue is a proposal to delete or modify the following part(s) in whatwg/url: > ... For example, browsers may omit a leading www or m domain label to simplify the host, ... ### Reasons #### Omitting `www` (or `m`) will not be helpful in spoofing The target of this part is to avoid spoofing or security-relevant distractions. "Spoofing" harms users only when they mistake a domain for another. In the case of `examplecorp.com@attacker.example`, the users might mistake `attacker.example` for `examplecorp.com`, which is harmful and might be avoided by vendors. However, it doesn't seem to be a security problem if the user is visiting `www.example.com` or `example.com`, since they are controlled by the same "registrant". #### `www.example.com` is not the same as `example.com`, which creates confusion `www` is a commonly-used subdomain for a website. A subdomain is different from an apex domain, which means that it's doable (and simple) to host different contents on the two domains. A consequence is that, `www.example.com` being reachable does not means `example.com` is reachable, and vise versa. I'm not sure if any standards are implying that `www.example.com` and `example.com` should be seen as the same sites. However, the confusion seems to be a practical problem. I tested against the list of [The Majestic Million](https://majestic.com/reports/majestic-million) (Why not Alexa? [Because the Alexa list is expensive](https://aws.amazon.com/marketplace/pp/B07QK2XWNV).) and found that a lot of sites (at least about 6%, or 61,222 out of 1,000,000) don't treat `www` and apex domain as the same. ### Data We only detect the differences by trying to resolve the domains in The Majestic Million with and without `www`. Therefore, the list of domains does not contain the domains that host different contents on `www` and the apex domain, if they are both resolvable. All the lists shown below are filtered over [Public Suffix List](https://publicsuffix.org/) to ensure that they are apex domains, rather than subdomains. The list of domains that have resolvable DNS records on the apex domain, but not `www`, is here: [WNoNYes.ps.txt](https://github.com/outloudvi/whatwg-url-481/blob/master/WNoNYes.ps.txt). This list contains 37,027 domains (3.70%). The top 20 domains in this list: ``` youtu.be www.gov.uk wa.me icio.us www.gov.cn www.nhs.uk flic.kr netdna-ssl.com 1drv.ms pinimg.com brightcove.net campaign-archive1.com campaign-archive2.com hwg.org bufferapp.com campaign-archive.com t.cn lnkd.in rapidshare.com aliyuncs.com ``` The list of domains that have resolvable DNS records on the `www` subdomain, but not the apex domain, is here: [WYesNNo.ps.txt](https://github.com/outloudvi/whatwg-url-481/blob/master/WYesNNo.ps.txt). This list contains 24,196 domains (2.41%). The top 20 domains in this list: ``` wixsite.com googleusercontent.com fda.gov miit.gov.cn bbb.org jiathis.com army.mil navy.mil securityfocus.com vatican.va filesusr.com nhk.or.jp gwu.edu af.mil ec-lyon.fr freetds.org specbench.org golux.com clickbank.net apachetutor.org ``` Any comments, problems, or suggestions are welcome. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/whatwg/url/issues/568
Received on Saturday, 2 January 2021 14:33:29 UTC