Re: [whatwg/url] It's not immediately clear that "URL syntax" and "URL parser" conflict (#118)

To hopefully clear up any confusion, neither the URL Standard nor the RFCs it obsoletes provides an algorithm for interpreting an arbitrary string of Markdown text and finding URLs within it. That seems to be what you're wondering about, @justjanne, with your discussion of the issue tracker. I believe that might be specified by [CommonMark](http://spec.commonmark.org/), but I am not sure.

Both the URL Standard and the RFCs it obsolete only operate on specific string inputs which are identified as URLs, for example in a `Location:` header or `<a href="">` element. In other surfaces, such as Markdown text, plaintext emails, or location bar entry, different heuristics apply.

For example, as you noted, `https://///url.spec.whatwg.org/` is parsed by `<a href="">` and `Location:` parsers as `https://url.spec.whatwg.org/`. But it isn't parsed that way by GitHub's Markdown parser.

On the other hand, `www.example.com` is parsed by `<a href="">` and `Location:` parsers as a relative URL, so e.g. `https://github.com/whatwg/url/issues/www.example.com`. But if you enter that into Markdown, it is instead parsed as `http://www.example.com/`, with an implicit `http:` and added slash. (Example: www.example.com). Similar considerations apply to the location bar, although that uses different heuristics, e.g. for my browser it parses the input `url` as `https://www.google.com/search?q=url&ie=utf-8&oe=utf-8`.

To reiterate: the URL Standard, like the RFCs before it, consider Markdown and the location bar out of scope. The URL Standard's "valid" definition can be helpful in building heuristics for situations like that, but won't suffice by itself. (E.g. without a base URL, `www.example.com` is not a valid URL.)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/118#issuecomment-338196199

Received on Friday, 20 October 2017 12:45:36 UTC