Re: [web-annotation] XPath Selector

While I am in favor of having a XPath selector, there are some issues 
we should be aware of if the WG accepts this proposal. These are all 
sub-issues that must be reflected, somehow, in the final document. 

### XPath and DOM

Formally, XPath is defined through a separate [XPath 
datamodel](https://www.w3.org/TR/xpath-datamodel-31/) document. That 
document, essentially, says that it relies on the (XML) infoset 
specification. That is an XML document, whereas HTML5 is not. I have 
asked our staff colleague (Carine), and this is what she said:

> The Web Annotation WG can use the XPath/XQuery data model if they 
need to, as long as they carefully study compatibility with the 
constructs to which they want to apply it. We used to have such a 
document for DOM Level 3, 
https://www.w3.org/TR/DOM-Level-3-XPath/xpath.html
>
>That could be a good starting point to evaluate whether DOM4 has 
departed too much from the original tree model. (I doubt it has)

I think the only thing we can/should do is to add a note in the spec, 
referring to the DOM 3 document so that authors/implementers should be
 aware of how the XPath is used and defined. (Note that there is no 
reference to XPath in the [DOM4 
spec](http://www.w3.org/TR/2015/REC-dom-20151119/).)

### XPAth and HTML5

In any case, what this means is that XPath works on top of the DOM and
 *not* on top of the original HTML source. This is important to be 
emphasized in the spec, because the HTML5 parser may slightly 
rearrange the original HTML code, which may affect the validity of an 
XPath expression. A possible reference is: 

https://www.w3.org/TR/html5/syntax.html

which describes the parser (and is therefore hell to read...). 
However, there are some important internal references to that section.
 One is:

https://www.w3.org/TR/html5/syntax.html#optional-tags

which lists the tags that may be missing in the HTML but will be added
 in the DOM (e.g., `tbody` element if missing). Anywhere that says a 
start tag can be omitted, it means the parser is going to add the 
element to the DOM, e.g., `html` `head` `body` `colgroup`, or `tbody`.
 

Another one is:

https://www.w3.org/TR/html5/syntax.html#an-introduction-to-error-handling-and-strange-cases-in-the-parser

with all kinds of nasty situation that the parser has to take care of 
(and which lead to DOM modifications). 

Again, what we can/should do is to add a note in the document drawing 
attention to this type of problems. 

### Normative reference issue

Another problem is the status of the XPath documents (I mean the 
latest, 3.1. versions). At the moment, all documents are in CR, 
meaning that they would be inappropriate as normative references from 
a Rec. Some in the reference chain have been in CR for more than a 
year… However, here is the info I got from Carine: 

> … it's expected to go to PR along with the other ones in the near 
future […] Working closely with the developer community, we expect to 
show evidence of implementations by approximately 1 March 2016. […] It
 should be in PR before autumn 2016.

If that happens, then we may be fine. But we will have to keep an eye 
on this to see if there are delays...

-- 
GitHub Notification of comment by iherman
Please view or discuss this issue at 
https://github.com/w3c/web-annotation/issues/95#issuecomment-188822574
 using your GitHub account

Received on Thursday, 25 February 2016 15:00:13 UTC