Re: [whatwg/dom] Proposal - Update XPath to (at least) v2.0 (#903)

@annevk XPath 2 (and, more to the point these days, 3.1) are highly backward compatible with XPath 1. There _are_ some differences. Example: in XPath 1, the string value of a sequence is the string value of the first item in the sequence; that was crazy and caused lots of bugs in people's XPath expressions.

The XPath 2 and 3 specs include notes for people implementing XPath 2 and 3 on how to handle those cases. They are very small edge cases & many are unlikely to apply to Web browser usage anyway.

Possible implementation approaches include (1) make a standard API that includes the desired XPath version; this is badly needed in any case... (2) use a JavaScript-based implementation (see e.g. frameless.io), (3) write or reuse a C/rust/C++ one, most likely starting with an XQuery implementation as that's an extension of XPath (XQuery 1 extends XPath 2, confusingly; XQuery 3.1 extends XPath 3.1).

Where XPath 1 was based on node lists, XPath 2 moved to being based on sequences; it's much more powerful for users, and a lot of things that were tricky became a lot clearer, but the underlying code is likely very different.

A CSS xpath('expr' [, version]) function would be super useful e.g. in the content property, as it can do string processing on text in the document - even if only in the "slow" profile of CSS.

@WebReflection the security issues in XPath are that there are functions (starting in XPath 1) that allow file access. The same security issues that XHTTPRequest has apply. There are also common extensions in XPath implementations to allow extended file access, but those make no sense in a Web browser - see e.g. expath.org. In XPath 3 it's possible to write recursive functions, as with JavaScript, so you could create infinite loops, and an implementation needs to detect this. There's also the possibility - again as with JavaScript - of building up variables, e.g. with the string concatenation operator || like this:
`let $a := "socks socks socks socks, $b := $a || $a || $a || $a, $c := $b || $b || $b || $b return $c || $c || $c || $c
which makes lots of socks. Or you can write
  string-join( (1 to 99999999), ", ")`
to make "1, 2, 3, ..."
As with JavaScript, a sensible workaround is limits on variable size & sequence length. So the security issues are known and manageable.
But that's different fromwhat @domenic meant, which is that there were security issues in the XML pipeline - that is, in the C libraries they have been using, which are large, complex, and hard to fix.

Yes, CSS could be extended to be comparable - e.g to be able to do string matching & processing on text content, date/time arithmetic, joins, union/intersection, and so forth. It'd be a lot of work, although just adding matches() and replace() would go a long way -
`td.matches("^-\d+") { color: red; }
`
(to invent a syntax in selectors)
although,
`span.price.xpath(. gt 0 and . lt 100 and not(preceding-sibling::span[. = 'special'])) { color: green; }`
would go further. I'd guess that in the next 10 or 15 years CSS will get there; in the meantime, custom CSS functions and selectors may give a way to do some of the things you can do with XPath, albeit more slowly.



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/dom/issues/903#issuecomment-707937246

Received on Tuesday, 13 October 2020 18:45:46 UTC