- From: Liam R E Quin <liam@w3.org>
- Date: Tue, 29 Nov 2011 00:33:16 -0500
- To: public-webapps@w3.org
Wearing my XML Activity Lead hat, I want to give some information that may help people decide here. The actual answer isn't my concern, but only that it's based on clear information. (0) XPath XPath is a language for selecting from XML (or HTML or SGML) document trees. It is used by some other specs, including XML Schema and XSLT, and it's extended by XQuery. XPath is very widely used in the XML world, e.g. in servers and on desktops and in shoes :-) I've lost track of the number of implementations of XPath, even though there are just a few dozen major implementations of course. XPath is popular because it has a regular syntax that's easy to learn and a good fit for XML. (1) XPath 1, 2 and 3 compatibility XPath 2 is backwards compatible with XPath 1; there _are_ some very minor differences, most of which would not affect Web browsers at all because they depend on DTDs or Schemas. Similarly, an XSLT 2 engine will interpret XSLT 1 transformations. There are some exceptions listed in the Backward Compatibility section of the XSLT 2 spec, but they are very minor. (2) Not a dead end XSLT 1 and XPath 1 are not "evolutionary dead ends" although it's true that neither the xt nor the libxml2 library supports XSLT 2 and XPath 2. There's some support (along with XQuery) in the Qt libraries, and also in C++ with XQilla and Zorba. There are maybe 50 implementations of XPath 2 and/or XQuery 2 that I've encountered. XQuery 3.0 and XPath 3.0 are about to go to Last Call, we hope, and XSLT 3.0 to follow next year. The work is very much active and alive. (3) XPath and efficiency XPath can be implemented very efficiently. In most cases in practice, O(1) or O(log n) can be achieved. Some of the techniques modern XPath libraries use are also used by Web browsers for CSS selectors - e.g. keep an index of elements, and evaluate from the right-hand end (most specific) or start with whichever element occurs the fewest times. There are implementations of XQuery (an extension of XPath) being used with petabytes of XML data. That is not to say you couldn't also use CSS selectors on petabytes of data -- it's not an either-or or a battle between the two languages. XQuery response times are generally measured in milliseconds, although, as with SQL or JavaScript or Java or C, you can write infinite loops :-) The trick is to notice that there are idioms in XPath that can be optimised much more easily than the corresponding JavaScript code. There have been some papers at VLDB on XPath and XQuery optimization. XPath was written with efficiency and optimization in mind, and drew on implementation experience. (4) XMLness XPath 2 is actually defined in terms of a data model, and can work over non-XML sources - e.g. not just HTML and XML, but also relational data and anything else that can be represented usefully with a similar data model. It lacks arrays and hashes/maps, which makes JSON support somewhat inconvenient, but there are people working on extendng XPath to handle JSON more gracefully (e.g. via "JSONIQ"). (5) Orthogonality I think this was mentioned in discussions but may not've been clear. In general, XPath is an expression language - anywhere you can have an expression, you can have any expression, and the expressions all work together. For example, predicates can contain any XPath expression, recursively: /html/body/div/p[@id = /html/head/link[@rel = 'me']/@src]/strong This is all "strong" elements in p elements that are direct children of div elements that are direct children of the body element, and whose p parent has an "d" attribute that has the same value as the src attribute of a link element in the head which has rel="me". (this is a microformat-style query on a document, of course) XPath selectors give a different way of looking at finding things than CSS selectors and probably appeal in differing amounts to different people. (6) Note on History Not really important today, but someone mentioned it, so I'll note that XPath came out of SGML and (later) HyTime work dating back long before the World Wide Web and CSS; that work really ended with the publication of DSSSL and HyTime, but many of the same people were (and in some cases are) involved with XML and XPath. XPath has different goals from CSS selectors, and there's not actually a battle between them XSLT and XQuery are widely used on the "back end" of Web apps, and less often in the browser, but in some environments the browser-based support can be very useful, depending on the division of labour. (7) XPath Selectors and CSS Selectors There's a huge overlap of functionality. Some claims were made based on misunderstandings (in both directions probably, but I can only correct the ones about XPath)... "XPath can't handle things like :hover or :first-line" -- not true. XPath has a mechanism by which a browser would support them, using a functional notation: //a[hover()] It's perfectly standard for an implementation to add functions. You can add them in your own namespace if you're prepared to accept more syntax: //a[css:hover()] for example. This wouldn't make sense in most XPath environments (e.g. inside SQL or Java) so it's left for an extension function, but that seems reasonable to me. In the other direction, there's no reason in principle why you couldn't have a CSS selector that looked something like xpath("some xpath expression here") { CSS properties here } if you wanted to. Maybe this would relieve CSS selectors from the burden of solving some relatively complex use cases. (8) note on the suggested API Martin Kadlec proposed findAll(query, use_xpath): CSS: findAll("nav a:first-child"); XPATH: findAll("//nav/a[1]", true); I'd suggest rather, findAll(expression [, language [, version]]) so as to be able to support languages other than CSS selectors or XPath in the future - e.g. XQuery or linq or dart or whatever - and to be able to say which version of those languages was the minimum needed. Better (and more JavaScriptIsh) might be a factory that returned a function that would evaluate XPath queries against a given DOM. xqe = document.makeDocumentQueryEngine("XPath", "1.0"); xqe->query("//nav/a[1]); as then the returned function could also take optional arguments, e.g. a prefix/URI object to handle namespaces in an XML DOM. It seems useful to me especially since the XPath engine is already in the browsers, even if it's a pretty old version of XPath. But that's my opinion. Really, I just want people to have some more information about XPath. Sorry for a long message - I hope this is helpful. Liam -- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ Pictures from old books: http://fromoldbooks.org/
Received on Tuesday, 29 November 2011 05:34:48 UTC