Re: [whatwg/url] Explain how syntax relates to the parser for hosts and URLs (#228)

domenic commented on this pull request.

I think this is a good start in the right direction, but I am concerned about two things, given the massive confusion we've seen elsewhere:

- Lack of global explanation of the difference between conformant and parseable. Some abbreviated version of https://html.spec.whatwg.org/multipage/introduction.html#conformance-requirements-for-authors, or maybe just a link to it, might be a good idea. I would envision a sibling section to "Parsers" (and into which "syntax violation" would move). It might also be good to use this to give concrete examples of the usefulness of conformance checkers for URLs; one that comes to mind is text-entry software that only recognizes or autolinks conformant URLs.
- Lack of local clarity while reading specific sections. I touch on this in the review, by suggesting renames like "URL string" and "URL syntax" to include "conformant" as a prefix so that when you read them without first reading the intros you're less confused. I also think an introductory sentence reiterating the fact that this is about conformance and not parsing would be good to add to the URL syntax section.

I think there is a lot of value in the work done to separate these and maintain both concepts, but as we've seen, the spec doesn't make it easy for people to appreciate that.

> @@ -226,9 +226,27 @@ point <a for=/>URLs</a> from <var>A</var> can come from untrusted sources.
 
 <h2 id="hosts-(domains-and-ip-addresses)">Hosts (domains and IP addresses)</h2>
 
-<!-- Punycode:
-     https://tools.ietf.org/html/rfc3492
-     https://mothereff.in/punycode -->
+<p>At a high level, a <a for=/>host</a>, <a>host string</a>, <a>host parser</a>, and
+<a>host serializer</a> relate as follows:
+
+<ul>
+ <li><p>The <a>host parser</a> takes an arbitrary string and returns either failure or a
+ <a for=/>host</a>. (This <a for=/>host</a> cannot be an <a>opaque host</a>, those can only be

comma should be semicolon; each half is a complete sentence

> @@ -226,9 +226,27 @@ point <a for=/>URLs</a> from <var>A</var> can come from untrusted sources.
 
 <h2 id="hosts-(domains-and-ip-addresses)">Hosts (domains and IP addresses)</h2>
 
-<!-- Punycode:
-     https://tools.ietf.org/html/rfc3492
-     https://mothereff.in/punycode -->
+<p>At a high level, a <a for=/>host</a>, <a>host string</a>, <a>host parser</a>, and
+<a>host serializer</a> relate as follows:
+
+<ul>
+ <li><p>The <a>host parser</a> takes an arbitrary string and returns either failure or a

I wonder if you want to link to infra for strings

> +<p>At a high level, a <a for=/>URL</a>, <a>URL string</a>, <a>URL parser</a>, and
+<a>URL serializer</a> relate as follows:
+
+<ul>
+ <li><p>The <a>URL parser</a> takes an arbitrary string and returns either failure or a
+ <a for=/>URL</a>.
+
+ <li><p>A <a for=/>URL</a> can be seen as the in-memory representation.
+
+ <li><p>A <a>URL string</a> defines what input would not trigger a <a>syntax violation</a> or
+ failure when given to the <a>URL parser</a>. I.e., input that would be considered conforming or
+ valid.
+
+ <li><p>The <a>URL serializer</a> takes a <a for=/>URL</a> and returns a string. (If that string
+ is then <a lt="URL parser">parsed</a>, the result will <a for=url>equal</a> the
+ <a lt="URL serializer">serialized</a> <a for=/>host</a>.)

copypasta "host"

> @@ -823,6 +842,27 @@ unified model would be, please file an issue.
 <!-- History behind URL as term:
      https://lists.w3.org/Archives/Public/uri/2012Oct/0080.html -->
 
+<p>At a high level, a <a for=/>URL</a>, <a>URL string</a>, <a>URL parser</a>, and

Reading this makes me wonder if it should be "conformant URL string" or "valid URL string" instead. A web programmer probably thinks of "a URL string" as "a string that is a URL", with an ambiguous meaning of "is" that doesn't necessarily have the nuance of conformance involved.

> @@ -823,6 +842,27 @@ unified model would be, please file an issue.
 <!-- History behind URL as term:
      https://lists.w3.org/Archives/Public/uri/2012Oct/0080.html -->
 
+<p>At a high level, a <a for=/>URL</a>, <a>URL string</a>, <a>URL parser</a>, and

Similarly maybe renaming the "URL syntax" section to "Conformant URL syntax"

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/pull/228#pullrequestreview-19827044

Received on Thursday, 2 February 2017 16:37:48 UTC