Re: [web-annotation] Functionality: XPath Range Selector from Benjamin Young on 2015-11-04 (public-annotation@w3.org from November 2015)

From: Benjamin Young <bigbluehat@hypothes.is>
Date: Wed, 4 Nov 2015 14:26:14 -0500
To: Liam Quin <liam@w3.org>
Cc: Frederick Hirsch via GitHub <sysbot+gh@w3.org>, W3C Public Annotation List <public-annotation@w3.org>
Message-ID: <CAE3H5F+v7P3JuAnzeyGPpLuFiD-t-B_wJ+koQzueOUYkyKCTNg@mail.gmail.com>

On Wed, Nov 4, 2015 at 10:26 AM, Liam Quin <liam@w3.org> wrote:

> On 2015-11-04 09:51, Frederick Hirsch via GitHub wrote:
>
>> should I be getting nervous about XPath and the potential need for
>> possible text normalization and canonicalization?
>>
> No...
>
> Are there
>> performance costs associated with normalization/canonicalization and
>> can they be avoided?
>>
>
> For Web annotations the document will presumably already have been parsed
> using the HTML 5 rules before the annotations are processed, so the
> necessary degree of normalization is probably already there.

We do currently specify that the text must be normalized before using
TextPositionSelector (etc). We'll need to do the same for XPath and CSS
selectors too, I'd reckon.

Let's not leave it to chance or "interpretation" in any case.

>
>
> Does findtext eliminate the need for XPath in our use cases? Can it?
>> should it?
>>
>
> findText is built on selectors - both CSS and XPath in the current draft -
> so no, you still need a selection mechanism.
>

It's not actually built on selectors, afaik. It may at some point provide
CSS and/or XPath as an expression of what it found:
http://w3c.github.io/findtext/#h-issue1

But it's really "just" JavaScript access to what the browsers already do
when they find content within a page--and I've no idea how those are
internally kept in memory while the search is being done...but I doubt it's
as CSS or XPath (as that's an additional calculation step.

Regardless, FindText doesn't eliminate the need for any of these selectors
as it's solely intended for Web Browser scripting APIs
(i.e....JavaScript...for the foreseeable future...). ;)

Even if FindText (or really TextQuoteSelector + edit distance) gets
expressed as a fragment identifier, it still may not be quite what's wanted
from XPath and/or CSS selectors.

XPath and CSS both have the advantage (over TextQuoteSelector anyhow) of
not storing any content (only structure) of the document being annotated.
That means there's no copyright infringement risk involved in creating an
annotation (unlike with TextQuoteSelector which MAY be "misused" for such a
purpose presently). See the NOTE in
http://www.w3.org/TR/annotation-model/#text-quote-selector for more on that
wonderful topic. ;)

Thanks for the thoughts, Liam!
Benjamin
--
Developer Advocate
http://hypothes.is/

> Liam
>
>
>> http://w3c.github.io/findtext/
>>
>
> --
> Liam Quin, W3C
> XML Activity Lead;
> Digital publishing; HTML Accessibility
>
>

Received on Wednesday, 4 November 2015 19:26:43 UTC