Re: [web-annotation] XPath Selector from Ivan Herman via GitHub on 2016-02-25 (public-annotation@w3.org from February 2016)

From: Ivan Herman via GitHub <sysbot+gh@w3.org>
Date: Thu, 25 Feb 2016 15:00:11 +0000
To: public-annotation@w3.org
Message-ID: <issue_comment.created-188822574-1456412410-sysbot+gh@w3.org>

While I am in favor of having a XPath selector, there are some issues
we should be aware of if the WG accepts this proposal. These are all
sub-issues that must be reflected, somehow, in the final document.

### XPath and DOM

Formally, XPath is defined through a separate [XPath
datamodel](https://www.w3.org/TR/xpath-datamodel-31/) document. That
document, essentially, says that it relies on the (XML) infoset
specification. That is an XML document, whereas HTML5 is not. I have
asked our staff colleague (Carine), and this is what she said:

> The Web Annotation WG can use the XPath/XQuery data model if they
need to, as long as they carefully study compatibility with the
constructs to which they want to apply it. We used to have such a
document for DOM Level 3,
https://www.w3.org/TR/DOM-Level-3-XPath/xpath.html
>
>That could be a good starting point to evaluate whether DOM4 has
departed too much from the original tree model. (I doubt it has)

I think the only thing we can/should do is to add a note in the spec,
referring to the DOM 3 document so that authors/implementers should be
aware of how the XPath is used and defined. (Note that there is no
reference to XPath in the [DOM4
spec](http://www.w3.org/TR/2015/REC-dom-20151119/).)

### XPAth and HTML5

In any case, what this means is that XPath works on top of the DOM and
*not* on top of the original HTML source. This is important to be
emphasized in the spec, because the HTML5 parser may slightly
rearrange the original HTML code, which may affect the validity of an
XPath expression. A possible reference is:

https://www.w3.org/TR/html5/syntax.html

which describes the parser (and is therefore hell to read...).
However, there are some important internal references to that section.
One is:

https://www.w3.org/TR/html5/syntax.html#optional-tags

which lists the tags that may be missing in the HTML but will be added
in the DOM (e.g., `tbody` element if missing). Anywhere that says a
start tag can be omitted, it means the parser is going to add the
element to the DOM, e.g., `html` `head` `body` `colgroup`, or `tbody`.

Another one is:

https://www.w3.org/TR/html5/syntax.html#an-introduction-to-error-handling-and-strange-cases-in-the-parser

with all kinds of nasty situation that the parser has to take care of
(and which lead to DOM modifications).

Again, what we can/should do is to add a note in the document drawing
attention to this type of problems.

### Normative reference issue

Another problem is the status of the XPath documents (I mean the
latest, 3.1. versions). At the moment, all documents are in CR,
meaning that they would be inappropriate as normative references from
a Rec. Some in the reference chain have been in CR for more than a
year… However, here is the info I got from Carine:

> … it's expected to go to PR along with the other ones in the near
future […] Working closely with the developer community, we expect to
show evidence of implementations by approximately 1 March 2016. […] It
should be in PR before autumn 2016.

If that happens, then we may be fine. But we will have to keep an eye
on this to see if there are delays...

--
GitHub Notification of comment by iherman
Please view or discuss this issue at
https://github.com/w3c/web-annotation/issues/95#issuecomment-188822574
using your GitHub account

Received on Thursday, 25 February 2016 15:00:13 UTC