Re: [whatwg/fetch] Define data: URLs (#579)

foolip commented on this pull request.

I reviewed this because it came up in https://docs.google.com/document/d/1FIUk5Y5_VmZ8rqHEsFEoXaysJUCRzf3RbvcyuYpim9c/edit?usp=sharing

Overall makes a lot of sense, just some missing links and questions about whitespace.

> @@ -5630,6 +5622,82 @@ if the script checks that the URL has the right hostname.
 
 
 
+<h2 id=data-urls><code>data:</code> URLs</h2>
+
+<p>For an informative description of <code>data:</code> URLs, see RFC 2397. This section replaces
+that RFC's normative processing requirements. [[RFC2397]]
+
+<p>The <dfn export><code>data:</code> URL processor</dfn> takes a <a for=/>URL</a>
+<var>dataURL</var> and then runs these steps:
+
+<ol>
+ <li><p>Assert: <var>dataURL</var>'s <a for=URL>scheme</a> is "<code>data</code>".

scheme ends up not being linked to anything, is the concept not exported in URL?

> +
+ <li><p>Remove the leading "<code>data:</code>" string from <var>input</var>.
+
+ <li><p>Let <var>position</var> point at the start of <var>input</var>.
+
+ <li><p>Let <var>encodedMimeType</var> be the result of
+ <a>collecting a sequence of code points</a> that are not equal to U+002C (,), given
+ <var>position</var>.
+
+ <li><p>If <var>position</var> is past the end of <var>input</var>, then return failure.
+
+ <li><p>Advance <var>position</var> by 1.
+
+ <li><p>Let <var>encodedBody</var> be the remainder of <var>input</var>.
+
+ <li><p>Let <var>mimeTypeBytes</var> be the <a>string percent decoding</a> of

string percent decoding doesn't get linked

> +
+ <li><p>Let <var>position</var> point at the start of <var>input</var>.
+
+ <li><p>Let <var>encodedMimeType</var> be the result of
+ <a>collecting a sequence of code points</a> that are not equal to U+002C (,), given
+ <var>position</var>.
+
+ <li><p>If <var>position</var> is past the end of <var>input</var>, then return failure.
+
+ <li><p>Advance <var>position</var> by 1.
+
+ <li><p>Let <var>encodedBody</var> be the remainder of <var>input</var>.
+
+ <li><p>Let <var>mimeTypeBytes</var> be the <a>string percent decoding</a> of
+ <var>encodedMimeType</var>.
+ <!-- Note: implementations leave the percent-encoded bits around. That strikes me as rather broken,

In other words, implementations don't percent decode the mime type at all? Are any of the tests failing everywhere because of this? If so, going with the broken reality seems preferable.

> + <li><p>If <var>position</var> is past the end of <var>input</var>, then return failure.
+
+ <li><p>Advance <var>position</var> by 1.
+
+ <li><p>Let <var>encodedBody</var> be the remainder of <var>input</var>.
+
+ <li><p>Let <var>mimeTypeBytes</var> be the <a>string percent decoding</a> of
+ <var>encodedMimeType</var>.
+ <!-- Note: implementations leave the percent-encoded bits around. That strikes me as rather broken,
+      but it might be due to web compatibility? -->
+
+ <li><p>Let <var>mimeType</var> be the <a>isomorphic decode</a> of <var>mimeTypeBytes</var>.
+
+ <li><p><a>Strip leading and trailing ASCII whitespace</a> from <var>mimeType</var>.
+
+ <li><p>Let <var>body</var> be the <a>string percent decoding</a> of <var>encodedBody</var>.

Also not linked

> + <li><p>Let <var>encodedBody</var> be the remainder of <var>input</var>.
+
+ <li><p>Let <var>mimeTypeBytes</var> be the <a>string percent decoding</a> of
+ <var>encodedMimeType</var>.
+ <!-- Note: implementations leave the percent-encoded bits around. That strikes me as rather broken,
+      but it might be due to web compatibility? -->
+
+ <li><p>Let <var>mimeType</var> be the <a>isomorphic decode</a> of <var>mimeTypeBytes</var>.
+
+ <li><p><a>Strip leading and trailing ASCII whitespace</a> from <var>mimeType</var>.
+
+ <li><p>Let <var>body</var> be the <a>string percent decoding</a> of <var>encodedBody</var>.
+
+ <li>
+  <p>If <var>mimeType</var> contains an <a>ASCII case-insensitive</a> match for
+  "<code>;base64;</code>" or ends with an <a>ASCII case-insensitive</a> match for

Is no whitespace around "base64" allowed? The URL `data:text/html ; base64 ;,YW5uZXZr` works for me on ChromeOS, so tests will be needed for this.

> +
+ <li><p>Let <var>body</var> be the <a>string percent decoding</a> of <var>encodedBody</var>.
+
+ <li>
+  <p>If <var>mimeType</var> contains an <a>ASCII case-insensitive</a> match for
+  "<code>;base64;</code>" or ends with an <a>ASCII case-insensitive</a> match for
+  "<code>;base64</code>", then:
+
+  <ol>
+   <li><p>Let <var>stringBody</var> be the <a>isomorphic decode</a> of <var>body</var>.
+
+   <li><p>Set <var>body</var> to the <a>forgiving-base64 decode</span> of <var>stringBody</var>.
+
+   <li><p>If <var>body</var> is failure, then return failure.
+
+   <li><p>Remove the code point sequence that is an <a>ASCII case-insensitive</a> match for

Should it say to remove the *first* such sequence, given that it could appear again later?

> +  "<code>;base64;</code>" or ends with an <a>ASCII case-insensitive</a> match for
+  "<code>;base64</code>", then:
+
+  <ol>
+   <li><p>Let <var>stringBody</var> be the <a>isomorphic decode</a> of <var>body</var>.
+
+   <li><p>Set <var>body</var> to the <a>forgiving-base64 decode</span> of <var>stringBody</var>.
+
+   <li><p>If <var>body</var> is failure, then return failure.
+
+   <li><p>Remove the code point sequence that is an <a>ASCII case-insensitive</a> match for
+   "<code>;base64</code>" from <var>mimeType</var>.
+  </ol>
+
+ <li><p>If <var>mimeType</var> starts with an <a>ASCII case-insensitive</a> match for
+ "<code>;charset=</code>", then prepend "<code>text/plain</code>" to <var>mimeType</var>.

A question of whitespace here as well.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/fetch/pull/579#pullrequestreview-59575098

Received on Wednesday, 30 August 2017 14:53:10 UTC