Re: [whatwg/url] Record whether the URL parser removed newlines. (#284) from Mike West on 2017-05-25 (public-webapps-github@w3.org from May 2017)

From: Mike West <notifications@github.com>
Date: Thu, 25 May 2017 10:06:49 -0700
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/pull/284/c304065231@github.com>

> my initial thought is that I would just pre-scan the String given from the HTML/SVG parser to the URL parser

Running through every URL is expensive; as I noted above, a very naive implementation caused a ~30% regression in one of Blink's parsing benchmarks. I imagine a cleverer implementation would have less impact, but it seems non-negligible.

I could also imagine tagging the attribute as containing newlines and less-than characters during HTML tokenization/parsing, but that a) also seems expensive, and b) would require holding a boolean on each attribute, which seems more expensive than holding a boolean on a URL.

Do you have a different approach in mind, @achristensen07? I'm not at all philosophically tied to putting this into the URL parser, but it seems like the most efficient place to do the work, especially since we're _already_ required to remove newline characters from URLs. Why not recognize other properties at the same time?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/pull/284#issuecomment-304065231

Received on Thursday, 25 May 2017 17:07:24 UTC