Re: Rough Draft of Robust Anchoring: the RangeFinder API

Hi, Bill–

On 3/25/15 10:37 AM, Bill Kasdorf wrote:
> Small but I think relevant point about
>
>> , then you'd look for instances of the minus sign
>
> That's only one of many ways a negative number could be indicated. It
> is often in parentheses rather than having a minus sign. Or it could
> be red. Etc.

Huh, I don't know that I've ever seen that. Probably now I'll see it 
everywhere, in true Baader-Meinhof phenomenology.

How would you indicate that a number is negative, such that a machine 
could always know that? I suppose you could express it in MathML in some 
way, but even MathML uses the minus sign as an operator, IIUI. And as of 
right now, the use of that in HTML is infinitesimal, so you'd have to 
include a whole set of heuristics in your interpreter to get any kind of 
reliability, and then you'd have false positives:
  "there were a bunch of people there -23 or so- and I couldn't find her"
  "I counted forty-two (42) chickens in the yard"


> If I'm understanding Liam's example correctly, he's
> properly basing the selection on the value itself (<0) rather than
> trying to second guess how that value is indicated.

Yes, that was an interesting difference to point out. If that's what's 
going on, then there has to be some sort of datatyping going on in 
XPath, which is another key difference. The text-search aspects of 
RangeFinder are only that: text. RangeFinder doesn't evaluate the 
semantics of the content, or try to datatype it. It just searches for 
the strings (modulo case-folding, diacritic-folding, edit distance, and 
other purely textual operations).

I think that a truly general searching webapp would have to be written 
using multiple APIs, including RangeFinder, the non-DOM text searcher 
Kristof has suggested, and maybe XPath or other pattern-matching 
mechanisms. But the scope of RangeFinder shouldn't be expanded to cover 
all cases, it should just do the bits that it's designed for, which I 
think will be broadly useful (especially in combination with other APIs).

Regards–
–Doug

> -----Original Message----- From: Doug Schepers
> [mailto:schepers@w3.org] Sent: Wednesday, March 25, 2015 1:47 AM To:
> Liam R. E. Quin Cc: W3C Public Annotation List Subject: Re: Rough
> Draft of Robust Anchoring: the RangeFinder API
>
> Hi, Liam–
>
> Thanks for the use cases.
>
> I'm sorry for being dense, but I'm not sure how this fits in with the
> RangeFinder API.
>
> Both of these cases are about using XPath to locate multiple ranges
> in a single pass, while RangeFinder is an iterative API that
> incrementally finds a single range at a time, within a particular
> scope of the document tree, with an optional initial starting point
> (thus the CSS or XPath selector).
>
> I'm not an expert in XPath, so I'm also not sure how to interpret
> your examples absent markup examples to apply them to.
>
>
> That being said, here's a quick reaction to the prose aspects of the
> use cases:
>
> 1) Find (annotate) all cells in which the net revenue is negative:
> in this case, with the RangeFinder API, you'd narrow the scope to the
> table element, then you'd look for instances of the minus sign, then
> use regex in JS to see if that is followed by a number. If you were
> looking for a specific negative number, that would be more
> straightforward. I considered adding some sort of "wildcard/regex"
> syntax to the search string component, but was discouraged from doing
> that, for performance reasons; it might still be a worthwhile idea to
> explore.
>
> 2) Find all students whose tutor is not listed: this sort of
> operation could be done in a manner similar to the example above
> (finding instances of the student's name, then looking for related
> course information in JS by scanning the DOM, assuming you know the
> DOM structure); but this is not really the point of RangeFinder. It's
> not intended as a generic pattern matcher, but rather as a
> narrowly-focused API to find instances of text, or other known
> ranges, with some ability to apply fuzzy logic around location in the
> document, text edit distance, and a few other factors.
>
> The functionality you're describing sounds interesting, but it
> sounds like a different technology; in fact, since you're describing
> a solution in XPath, is there anything else needed to solve your use
> case?
>
>
> As a side note regarding XPath, I'm most interested in the
> robust/fuzzy aspects that I understand were left out of XPath, but
> which were under consideration; can you share any info on that?
>
> Regards– –Doug
>
> On 3/24/15 7:36 PM, Liam R. E. Quin wrote:
>> On Wed, 2015-02-25 at 00:48 -0500, Doug Schepers wrote:
>>> Hi, folks–
>>>
>>> Just a quick note. Rob asked me to move this file, to keep the
>>> deliverables organized. It's now located at:
>>>
>>> http://w3c.github.io/web-annotation/api/rangefinder/
>>
>> And now at https://specs.webplatform.org/rangefinder/w3c/master/
>>
>> I promised Doug at least a couple of uses cases for the XPath
>> selector. I can write them up in more detail if they're felt to be
>> reasonable.
>>
>> (1) consider a table such as a profit/loss statement in an annual
>> report; let's annotate all cells in which the net revenue is
>> negative. The XPath expression might be something like //table[@id
>> = 'profit-and-loss']//th[. = 'Net Revenue']/following-
>> sibling::td[. < 0]
>>
>> (2) Find all students whose tutor is not listed:
>>
>> //li[@class = 'student'] [ [@class='tutor'] [
>> not(//li[@class='tutor']/@id = concat('#', @href)) ] ]
>>
>> These are both fairly complex examples in the spirit of "make the
>> easy easy and the complex possible". Note that any identifier
>> pointing at actual text will not be possible with CSS selectors,
>> although a combination of selectors and byte ranges within a
>> containing element can be used. But there should also be a checksum
>> and/or text comparison in case the wrong text is highlighted, of
>> course.
>>
>> Hope this helps. I have both simpler and more complex examples of
>> course, if needed.
>>
>> Liam
>>
>>
>>>
>>> Even this is a temporary location, though... I'll be moving it
>>> to specs.webplatform.org soon, and adding the annotation
>>> capability to it.
>>>
>>> Feel free to review, but be aware that the URL is transitory.
>>>
>>> Regards– –Doug
>>>
>>> On 2/24/15 1:33 PM, Doug Schepers wrote:
>>>> Hi, folks–
>>>>
>>>> After talking about Robust Anchoring with many people over the
>>>> course of the last couple years (!), with encouragement and
>>>> good criticisms, I've refined my notion of what's needed for a
>>>> client- side API for Robust Anchoring.
>>>>
>>>> I've drawn up a strawman of my current thinking for an API
>>>> called RangeFinder [1].
>>>>
>>>> It's very rough in places, but I'd appreciate any feedback on
>>>> the spec as it stands. I'd greatly appreciate any thoughts or
>>>> opinions on it at this stage.
>>>>
>>>> I'm not sure it's mature enough for this yet, but at some
>>>> point, I'd like to engage the research and academic communities
>>>> and the experts who've published on text search algorithms, to
>>>> polish this up and make it not quite as embarrassing as it is
>>>> currently. If anyone knows who we should contact in that
>>>> regard, please chime in. This is a great opportunity to
>>>> leverage all that research in the service of Web developers and
>>>> browsers!
>>>>
>>>> [1] http://w3c.github.io/web-annotation/rangefinder-api/
>>>>
>>>> Regards– –Doug
>>>>
>>>
>>>
>>
>

Received on Wednesday, 25 March 2015 18:22:28 UTC