W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > June 2009

[Bug 7059] [blocked on xpathwg] Forking XPath

From: <bugzilla@wiggum.w3.org>
Date: Mon, 29 Jun 2009 12:15:44 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1MLFlw-0003Qw-Ee@wiggum.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=7059





--- Comment #13 from Henri Sivonen <hsivonen@iki.fi>  2009-06-29 12:15:43 ---
(In reply to comment #3)
> Both XQuery and XSLT are frequently applied to HTML documents. The document is
> first parsed to create an XDM instance (often using tools like Tagsoup to deal
> with cruft), then processed appropriately. 

But that's different. TagSoup assigns HTML elements into the
http://www.w3.org/1999/xhtml namespace *like an HTML5 parser does* but legacy
browser-based HTML parsers didn't.

If you have an app that currently uses TagSoup and any version of XPath, you
don't need to change anything on the XPath layer if you move from TagSoup to an
HTML5-compliant parser.

> Screen scraping and data integration are one important class of applications
> that do this.

Indeed, but they have nothing to do with this special case in the spec.

In the screen scraping scenario, the XPath expressions are supplied by the
scraper developer--not by the remote Web content. The issue at hand has
everything to do with the case where the XPath 1.0 expressions are supplied by
existing content in JavaScript programs using the document.evaluate API.

Hixie, I think the spec should make it clearer that the willful violation of
XPath 1.0 only applies to UAs that support scripting and let scripts in content
evaluate XPath expressions against the DOM.

> XPath is used in both XQuery and XSLT. It's going to be extremely confusing if
> XPath expressions are interpreted differently when executed inside a browser
> environment, especially since the documents that define the XPath standard do
> not support this interpretation.

Frankly, I think most users of XPath will never even realize that this hack is
in place and, therefore, won't be confused by it.

> I suggest that you define a profile of XPath 2.0 that corresponds to the
> functionality of XPath 1.0 plus default namespaces, and also define the mapping
> of your XML documents to XDM (you have to do this regardless, because XPath is
> defined in terms of the XDM, not the DOM). 

The point of having this in the spec is to provide advice to implementors who
have XPath 1.0 engines but haven't upgraded to DOM5 yet. When I implemented
this for Gecko, I first had to experience test case failures and then go find
out what WebKit does. The only reason I'm pursuing this is that I want to do
unto the next implementor what I wish the previous implementor had done unto
me.

(In reply to comment #6)
> If you want a language that has different semantics from XPath, I think the
> clean thing to do would be to create a completely different syntax.

That's completely infeasible, since the whole point is to keep existing XPath
1.0 expressions, which are already part of existing script out there, working.

(In reply to comment #11)
> (In reply to comment #10)
> I think it would be helpful to get a small group together from your Working
> Group and from the XSL and XQuery Working Groups to make sure we understand the
> requirements on both sides and look for solutions.

Here are the requirements for the case where the UA accepts XPath 1.0
expressions from Web content through scripting:

 1) Prefixless name expressions in XPath 1.0 expressions passed to
document.evaluate() must match against HTML element nodes in HTML documents
(for existing expressions). This requirement is not negotiable. It's a
non-starter to suggest that a browser vendor whose previous release exhibits
this behavior make their next release not exhibit this behavior.

 2) Name expressions whose namespace http://www.w3.org/1999/xhtml should match
against HTML element nodes in HTML documents (for prospective expressions).
This isn't a hard requirement, but not having this property would hinder
expression portability between HTML and XHTML.

 3) The solution must not require browser vendors who currently ship XPath 1.0
engines to upgrade to an XPath 2.x engine. This is practically a hard
requirement.

 4) HTML element nodes in the DOM should report http://www.w3.org/1999/xhtml as
their namespace. (Note that giving up on this point would require special
casing all over while putting the hack in the XPath matcher isolates the hack.
Also note that this property removes the need of a hack from Selectors. As a
consequence, it's safe to consider this as a pretty serious requirement at this
point.)

 5) It's more important for different browsers to do the same thing than for
some browsers to be more purely XPath 2.0-like.

 6) The XPath engine shouldn't have to modify its behavior depending on whether
the expression came in via document.evaluate() or other means. This is a fairly
hard requirement.

Here are the requirements for other cases (already satisfied by TagSoup +
off-the-shelf XPath library):

 A) Name expressions whose namespace http://www.w3.org/1999/xhtml should also
match against HTML element nodes in HTML documents.

 B) HTML elements should be in the http://www.w3.org/1999/xhtml namespace.

- -

As you can see, the only degree of freedom here for UAs that support scripting
and document.evaluate() is whether no-namespace expressions match against
no-namespace element nodes *in addition to* matching against HTML nodes. And
even in that case, uniformity between browsers is more important than being a
purer subset of XPath 2.0.

There's no impact on applications that don't get their XPath expressions from
Web content but whose XPath expressions are supplied by the application
developer.


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Monday, 29 June 2009 12:15:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 29 June 2009 12:16:00 GMT