- From: Benjamin Young <bigbluehat@hypothes.is>
- Date: Wed, 4 Nov 2015 14:26:14 -0500
- To: Liam Quin <liam@w3.org>
- Cc: Frederick Hirsch via GitHub <sysbot+gh@w3.org>, W3C Public Annotation List <public-annotation@w3.org>
- Message-ID: <CAE3H5F+v7P3JuAnzeyGPpLuFiD-t-B_wJ+koQzueOUYkyKCTNg@mail.gmail.com>
On Wed, Nov 4, 2015 at 10:26 AM, Liam Quin <liam@w3.org> wrote: > On 2015-11-04 09:51, Frederick Hirsch via GitHub wrote: > >> should I be getting nervous about XPath and the potential need for >> possible text normalization and canonicalization? >> > No... > > Are there >> performance costs associated with normalization/canonicalization and >> can they be avoided? >> > > For Web annotations the document will presumably already have been parsed > using the HTML 5 rules before the annotations are processed, so the > necessary degree of normalization is probably already there. We do currently specify that the text must be normalized before using TextPositionSelector (etc). We'll need to do the same for XPath and CSS selectors too, I'd reckon. Let's not leave it to chance or "interpretation" in any case. > > > Does findtext eliminate the need for XPath in our use cases? Can it? >> should it? >> > > findText is built on selectors - both CSS and XPath in the current draft - > so no, you still need a selection mechanism. > It's not actually built on selectors, afaik. It may at some point provide CSS and/or XPath as an expression of what it found: http://w3c.github.io/findtext/#h-issue1 But it's really "just" JavaScript access to what the browsers already do when they find content within a page--and I've no idea how those are internally kept in memory while the search is being done...but I doubt it's as CSS or XPath (as that's an additional calculation step. Regardless, FindText doesn't eliminate the need for any of these selectors as it's solely intended for Web Browser scripting APIs (i.e....JavaScript...for the foreseeable future...). ;) Even if FindText (or really TextQuoteSelector + edit distance) gets expressed as a fragment identifier, it still may not be quite what's wanted from XPath and/or CSS selectors. XPath and CSS both have the advantage (over TextQuoteSelector anyhow) of not storing any content (only structure) of the document being annotated. That means there's no copyright infringement risk involved in creating an annotation (unlike with TextQuoteSelector which MAY be "misused" for such a purpose presently). See the NOTE in http://www.w3.org/TR/annotation-model/#text-quote-selector for more on that wonderful topic. ;) Thanks for the thoughts, Liam! Benjamin -- Developer Advocate http://hypothes.is/ > Liam > > >> http://w3c.github.io/findtext/ >> > > -- > Liam Quin, W3C > XML Activity Lead; > Digital publishing; HTML Accessibility > >
Received on Wednesday, 4 November 2015 19:26:43 UTC