Re: XPath and find/findAll methods from Liam R E Quin on 2011-11-30 (public-webapps@w3.org from October to December 2011)

From: Liam R E Quin <liam@w3.org>
Date: Wed, 30 Nov 2011 14:18:15 -0500
To: Henri Sivonen <hsivonen@iki.fi>
Cc: public-webapps@w3.org
Message-ID: <1322680695.11286.161.camel@desktop.barefootcomputing.com>
On Tue, 2011-11-29 at 18:09 +0200, Henri Sivonen wrote:
> On Tue, Nov 29, 2011 at 7:33 AM, Liam R E Quin <liam@w3.org> wrote:
> > (2) Not a dead end
[...]

Thanks for responding, Henri.

A detailed reply follows, but the short answer is -
(1) yes, browsers could be using the latest XPath. It would help authors
greatly.
(2) yes, there are some issues to resolve. The way to resolve issues is
liaison, and working together.  We should do more of that.
(3) the example I gave can in fact work in an XPath 2 environment.
(4) Backwards Compatibility Mode is probably rather badly named; just as
HTML has legacy content, so does XSLT.

This is why I suggest that, if a findAll() is introduced that can do
languages other than CSS Selectors, it allow a version number, with the
meaning, "I'm expecting at least this version of the language". Then one
could write code, if necessary to fall back, maybe downloading emulation
code in JavaScript if e.g. CSS selectors 5 or XPath 4 or whatever wasn't
available.

None of this is a reason not to make the existing XPath in Web browsers
easier to use today, though. I felt I needed to post to balance things a
bit because the status of XPath seemed unclear to some people on the
list.

Slightly more detailed reply...

> Sure, XPath and XSLT keep being developed. What I meant by
> evolutionary dead end is that the XPath 1.0-compatibile evolutionary
> path has been relegated to a separate mode instead of XPath 2.0 and
> newer being compatible by design.So the new development you cite
> happens with Compatibility Mode set to false. 

We can't change history, the compatibility mode is about compatibility
e.g. with pre-existing XSLT 1 stylesheets.  I don't see HTML 5 adding
the canvas element to RFC 1866 (HTML 2) either - and wouldn't expect it.
Just as you don't expect people working on Mosaic or Cello to need new
features in the HTML 2 spec, we're not adding new features to XPath 1,
because we already did that and called it XPath 2 :-)

Rather, the XPath spec is careful to document the differences in
behaviour.

We should really get rid of callign it compatibility mode and have
specific feature tests instead -- it'd be a lot clearer.  The main
differences are
* XPath 2 introduces sequences, so you can have sequences of arbitrary
values, not just nodelists; the empty sequence is returned in some cases
where it makes sense but Xath 1 used NaN or silently failed in some
other way; this could be a "sequence-available" feature.

* XPath 2 introduced named typing, rather like the C Programming
Language - e.g. a sockSize can be treated differently from a shoeSize
even though they are both numeric. Amongst other things, this allows a
saner interpretation of A = B, in the case that A and B are xs:boolean.
Although these are factored out into compatibility mode, in fact, they
are mostly cases that could never arise in XPath 1, as it didn't have
the typing system, so we could probably merge them rather than having a
type-system-available feature test.

* a number of error cases now raise errors instead of either failing or
doing an obviously wrong thing.  E.g. if $n is a nodelist of 3
paragraphs having content "11", "2", and "3" respectively, in XPath 1,
$n + 6 gives 17, and in XPath 2 it gives an error. But a Web browser
could plausibly use the XPath 1 behaviour and also emit a warning in the
developer console I think.

* XPath 2 allows implementations much more freedom in rearranging
evaluation order, greatly improving performance.

* there are some other minor changes listed in appendix I of the current
XPath 2.0 draft [1]. Most of these result in errors, so a browser could
easily allow them and produce helpful warnings. E.g. "1 < $a < 6" will
always be true, if $a is numeric, in XPath 1 (it's evaluated left to
right and you get 0 or 1 for "1 < $a"), and is an error in XPath 2.

* there are some changes to do with DTD and Schema handling that do not
affect Web browsers.

Frankly the differences are probably comparable to differences between
versions of HTML or editions of CSS -- the later specs get more precise,
and in the case of ambiguities or weird corner cases sometimes it means
a change.  The HTML 5 parsing goal was that all browsers would produce
the same DOM for a given document, and since that wasn't true
previously, some documents clearly now generate a different DOM.


> I don't have enough data about existing XPath-using Web content to
> know how badly the Web would break if browsers started interpreting
> existing XPath (1.x) expressions as XPath 2.x expression with
> Compatibility Mode set to false, but the fact that the WG felt that it
> needed to define a compatibility mode suggests that the WG itself
> believed the changes to be breaking ones.

They are breaking in a sense - the culture of the XML Activity is to be
very detailed and increasingly precise, so yes, there _are_ possible
XPath expressions which changed meaning.

The answer to this is not to avoid any spec that changes, as that would
also mean avoiding HTML 4, HTML 5, CSS, Java, C, or any other language
that is in any sense living.  The answer is active liaison.

In practice no, it's very unlikely that the Web will break, because the
changes (regardless of xpath compatibility mode being true or false) are
mostly small edge cases; and if it does, we can work together to fix it.
I think there'd be quite a bit of interest from the XML world if Web
browsers were willing to move to a newer XPath and XSLT.


> >    /html/body/div/p[@id = /html/head/link[@rel = 'me']/@src]/strong
> 
> This example depends on unprefixed name expressions matching the
> (X)HTML namespace when tested against an element and no namespace when
> tested against attributes. And that trick only works with (X)HTML
> nodes.

XML attributes are not in any namespace unless explicitly put there.

The example does depend on a default namespace for XPath expressions
being that of the document, a feature added for exactly this sort of
situation.

I hope this helps.

Best,

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Received on Wednesday, 30 November 2011 19:19:41 UTC