Re: [whatwg/url] Record whether the URL parser removed newlines. (#284)

> Newline and tab checking is not done in a separate and simple loop, but rather every time the iterator on the input is incremented. 

That's a clever implementation, thanks for sharing the link!

I agree that it would be a little more work for y'all to get the behavior specced here, but I think you can still do it in a single pass. For example, I could imagine that the two-flag proposal from earlier in the thread might be reasonable:

```
++iterator;
while (UNLIKELY(!iterator.atEnd() && isTabOrNewline(*iterator))) {
    m_whitespaceEncountered = true;
    if (reportSyntaxViolation == ReportSyntaxViolation::Yes)
        syntaxViolation(iteratorForSyntaxViolationPosition);
    ++iterator;
}
if (UNLIKELY(*iterator == '<'))
  m_lessThanEncountered = true;
```

with a corresponding value on `URL` that you'd set in either `URL`'s or `URLParser`'s constructor after parsing. That comes at a cost of an extra `if` for every call to `advance()`, and three bools (two on `URLParser` and one on `URL`).

> All this is just an implementation detail and people can implement the spec however they see fit, but I oppose to adding necessary data fields and additional operations everywhere in efficient implementations of an algorithm, especially if the benefit is just to guess whether fetched data is malicious.

I agree that the benefit is something we'd need to weigh against the performance costs, and that's a judgement call that I can't make for the WebKit project. From my perspective, dangling markup is a well-known and fairly soft target for attackers. As we continue to close off other avenues for code injection (via CSP and etc.) it's only going to increase in impact. Killing off a substantial portion of this class of attack seems like it's worth some cost to URL processing speed.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/pull/284#issuecomment-303350010

Received on Tuesday, 23 May 2017 09:55:48 UTC