- From: <bugzilla@wiggum.w3.org>
- Date: Mon, 29 Jun 2009 12:15:44 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=7059 --- Comment #13 from Henri Sivonen <hsivonen@iki.fi> 2009-06-29 12:15:43 --- (In reply to comment #3) > Both XQuery and XSLT are frequently applied to HTML documents. The document is > first parsed to create an XDM instance (often using tools like Tagsoup to deal > with cruft), then processed appropriately. But that's different. TagSoup assigns HTML elements into the http://www.w3.org/1999/xhtml namespace *like an HTML5 parser does* but legacy browser-based HTML parsers didn't. If you have an app that currently uses TagSoup and any version of XPath, you don't need to change anything on the XPath layer if you move from TagSoup to an HTML5-compliant parser. > Screen scraping and data integration are one important class of applications > that do this. Indeed, but they have nothing to do with this special case in the spec. In the screen scraping scenario, the XPath expressions are supplied by the scraper developer--not by the remote Web content. The issue at hand has everything to do with the case where the XPath 1.0 expressions are supplied by existing content in JavaScript programs using the document.evaluate API. Hixie, I think the spec should make it clearer that the willful violation of XPath 1.0 only applies to UAs that support scripting and let scripts in content evaluate XPath expressions against the DOM. > XPath is used in both XQuery and XSLT. It's going to be extremely confusing if > XPath expressions are interpreted differently when executed inside a browser > environment, especially since the documents that define the XPath standard do > not support this interpretation. Frankly, I think most users of XPath will never even realize that this hack is in place and, therefore, won't be confused by it. > I suggest that you define a profile of XPath 2.0 that corresponds to the > functionality of XPath 1.0 plus default namespaces, and also define the mapping > of your XML documents to XDM (you have to do this regardless, because XPath is > defined in terms of the XDM, not the DOM). The point of having this in the spec is to provide advice to implementors who have XPath 1.0 engines but haven't upgraded to DOM5 yet. When I implemented this for Gecko, I first had to experience test case failures and then go find out what WebKit does. The only reason I'm pursuing this is that I want to do unto the next implementor what I wish the previous implementor had done unto me. (In reply to comment #6) > If you want a language that has different semantics from XPath, I think the > clean thing to do would be to create a completely different syntax. That's completely infeasible, since the whole point is to keep existing XPath 1.0 expressions, which are already part of existing script out there, working. (In reply to comment #11) > (In reply to comment #10) > I think it would be helpful to get a small group together from your Working > Group and from the XSL and XQuery Working Groups to make sure we understand the > requirements on both sides and look for solutions. Here are the requirements for the case where the UA accepts XPath 1.0 expressions from Web content through scripting: 1) Prefixless name expressions in XPath 1.0 expressions passed to document.evaluate() must match against HTML element nodes in HTML documents (for existing expressions). This requirement is not negotiable. It's a non-starter to suggest that a browser vendor whose previous release exhibits this behavior make their next release not exhibit this behavior. 2) Name expressions whose namespace http://www.w3.org/1999/xhtml should match against HTML element nodes in HTML documents (for prospective expressions). This isn't a hard requirement, but not having this property would hinder expression portability between HTML and XHTML. 3) The solution must not require browser vendors who currently ship XPath 1.0 engines to upgrade to an XPath 2.x engine. This is practically a hard requirement. 4) HTML element nodes in the DOM should report http://www.w3.org/1999/xhtml as their namespace. (Note that giving up on this point would require special casing all over while putting the hack in the XPath matcher isolates the hack. Also note that this property removes the need of a hack from Selectors. As a consequence, it's safe to consider this as a pretty serious requirement at this point.) 5) It's more important for different browsers to do the same thing than for some browsers to be more purely XPath 2.0-like. 6) The XPath engine shouldn't have to modify its behavior depending on whether the expression came in via document.evaluate() or other means. This is a fairly hard requirement. Here are the requirements for other cases (already satisfied by TagSoup + off-the-shelf XPath library): A) Name expressions whose namespace http://www.w3.org/1999/xhtml should also match against HTML element nodes in HTML documents. B) HTML elements should be in the http://www.w3.org/1999/xhtml namespace. - - As you can see, the only degree of freedom here for UAs that support scripting and document.evaluate() is whether no-namespace expressions match against no-namespace element nodes *in addition to* matching against HTML nodes. And even in that case, uniformity between browsers is more important than being a purer subset of XPath 2.0. There's no impact on applications that don't get their XPath expressions from Web content but whose XPath expressions are supplied by the application developer. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Monday, 29 June 2009 12:15:59 UTC