[whatwg/url] Omitting www might be confusing for users (#568)

I'm creating this issue as a comment of whatwg/url [4.8.1. Simplify non-human-readable or irrelevant components](https://url.spec.whatwg.org/#url-rendering-simplification).

The related part is introduced in <https://github.com/whatwg/url/commit/8809598ddfd1d935432c8a0cad53f13d70e24bc6>, which is a part of "Guidelines for URL Display" added [here](https://chromium-review.googlesource.com/c/chromium/src/+/1402344).

### Quote

This issue is a proposal to delete or modify the following part(s) in whatwg/url:

> ... For example, browsers may omit a leading www or m domain label to simplify the host, ...

### Reasons

#### Omitting `www` (or `m`) will not be helpful in spoofing

The target of this part is to avoid spoofing or security-relevant distractions. "Spoofing" harms users only when they mistake a domain for another. In the case of `examplecorp.com@attacker.example`, the users might mistake `attacker.example` for `examplecorp.com`, which is harmful and might be avoided by vendors. However, it doesn't seem to be a security problem if the user is visiting `www.example.com` or `example.com`, since they are controlled by the same "registrant".

#### `www.example.com` is not the same as `example.com`, which creates confusion

`www` is a commonly-used subdomain for a website. A subdomain is different from an apex domain, which means that it's doable (and simple) to host different contents on the two domains. A consequence is that, `www.example.com` being reachable does not means `example.com` is reachable, and vise versa.

I'm not sure if any standards are implying that `www.example.com` and `example.com` should be seen as the same sites. However, the confusion seems to be a practical problem. I tested against the list of [The Majestic Million](https://majestic.com/reports/majestic-million) (Why not Alexa? [Because the Alexa list is expensive](https://aws.amazon.com/marketplace/pp/B07QK2XWNV).) and found that a lot of sites (at least about 6%, or 61,222 out of 1,000,000) don't treat `www` and apex domain as the same.

### Data

We only detect the differences by trying to resolve the domains in The Majestic Million with and without `www`. Therefore, the list of domains does not contain the domains that host different contents on `www` and the apex domain, if they are both resolvable.

All the lists shown below are filtered over [Public Suffix List](https://publicsuffix.org/) to ensure that they are apex domains, rather than subdomains.

The list of domains that have resolvable DNS records on the apex domain, but not `www`, is here: [WNoNYes.ps.txt](https://github.com/outloudvi/whatwg-url-481/blob/master/WNoNYes.ps.txt). This list contains 37,027 domains (3.70%). The top 20 domains in this list:

```
youtu.be
www.gov.uk
wa.me
icio.us
www.gov.cn
www.nhs.uk
flic.kr
netdna-ssl.com
1drv.ms
pinimg.com
brightcove.net
campaign-archive1.com
campaign-archive2.com
hwg.org
bufferapp.com
campaign-archive.com
t.cn
lnkd.in
rapidshare.com
aliyuncs.com
```

The list of domains that have resolvable DNS records on the `www` subdomain, but not the apex domain, is here: [WYesNNo.ps.txt](https://github.com/outloudvi/whatwg-url-481/blob/master/WYesNNo.ps.txt). This list contains 24,196 domains (2.41%). The top 20 domains in this list:
```
wixsite.com
googleusercontent.com
fda.gov
miit.gov.cn
bbb.org
jiathis.com
army.mil
navy.mil
securityfocus.com
vatican.va
filesusr.com
nhk.or.jp
gwu.edu
af.mil
ec-lyon.fr
freetds.org
specbench.org
golux.com
clickbank.net
apachetutor.org
```

Any comments, problems, or suggestions are welcome.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/568

Received on Saturday, 2 January 2021 14:33:29 UTC