W3C home > Mailing lists > Public > public-annotation@w3.org > November 2015

Re: [web-annotation] Functionality: XPath Range Selector

From: Liam Quin via GitHub <sysbot+gh@w3.org>
Date: Wed, 04 Nov 2015 04:20:28 +0000
To: public-annotation@w3.org
Message-ID: <issue_comment.created-153567359-1446610827-sysbot+gh@w3.org>
On 2015-11-03 20:04, Randall Leeds wrote:
[...]
> But my understanding of XPath doesn't include any way to measure 
text
> offsets from Elements, only within Text Nodes, though I may be 
missing
> something.

(Sorry if I'm chiming in without enough context here)

I am not sure what you mean by measuring text offsets from elements. 
I'll guess that, in
<p>The <em>happy</em> boy jumped for joy when he saw the <a 
...>cheesecake</a>!</p>
if you're annotating "boy" you want to count the number of characters 
in 
"the happy " beforehand? In which case yes, Xpath 1 can do that;

  string-length(substring-before(., "boy")) for example.

You can use string-length() and substring() and (for more robustness 
perhaps) substring-before() on the string value of any node, including
 
the entire subtree.

> The selector can be more robust against formatting changes by 
ignoring
> inline content.

That makes it robust against text changes that preserve markup, but 
not 
against markup changes. It's a tradeoff. Using @id values can help 
with 
robustness in many (not all) environments.

> This requires measuring from some ancestor Element, a
> block element in this example. One might get more particular,
> measuring from the start of an article or main tag, or a p tag
> ancestor.

A traditional way to do this in hypertext, structured editors and 
elsewhere is with tumblers; there's considerable implementation 
experience, some of which was reflected in XPointer. What does it mean
 
to "measure" a tree? Remember that (at least in theory) text/html may 
get line endings rewritten by proxies, so using normalized text is 
essential.

> 
> Making Range generic and letting XPath stand alone would mean that 
you
> could describe the boundaries as CSS or XPath and optionally offsets
> therefrom.

OK, that makes sense I think,


Liam

-- 
Liam Quin, W3C
XML Activity Lead;
Digital publishing; HTML Accessibility


-- 
GitHub Notif of comment by liamquin
See 
https://github.com/w3c/web-annotation/issues/95#issuecomment-153567359
Received on Wednesday, 4 November 2015 04:20:35 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 18:54:42 UTC