Re: FindText API Updated Editor's Draft from Randall Leeds on 2015-10-09 (public-annotation@w3.org from October 2015)

From: Randall Leeds <randall@bleeds.info>
Date: Fri, 09 Oct 2015 19:07:22 +0000
To: Benjamin Young <bigbluehat@hypothes.is>, Ivan Herman <ivan@w3.org>
Cc: Bill Hunt <bill@opengovfoundation.org>, Doug Schepers <schepers@w3.org>, W3C Public Annotation List <public-annotation@w3.org>
Message-ID: <CAAL6JQj9_fqmGRntbsG-zcgFPiyUcDaj5YjCk+tX1azmPg9_1w@mail.gmail.com>
I'm not able to follow this thread and any issues with .search().

If you put a wrapper around the search function as Bill has, you can invoke
that wrapper again in a promise resolve callback and this isn't recursion,
it's more like a coroutine call. The resolution closure is calling the
function, not the function itself. You also can aggregate results without
aggregating promises, letting each one resolve.

function searchAll(rf) {
  return new Promise(function (resolve, reject) {
    var results = [];
    function next() {
      rf.search().then(function (match) {
        if (match) {
          results.push(match);
          next();
        } else {
          resolve(results);
        }
      }, reject);
    }
    next();
  });
}

searchAll(rf).then(function (results) { ... })

No recursion. No stack of Promises. I don't see a problem with this.



On Thu, Oct 8, 2015 at 5:51 AM Benjamin Young <bigbluehat@hypothes.is>
wrote:

> On Thu, Oct 8, 2015 at 5:53 AM, Ivan Herman <ivan@w3.org> wrote:
>
>>
>> On 07 Oct 2015, at 21:43 , Benjamin Young <bigbluehat@hypothes.is> wrote:
>>
>> Yeah...I don't think I'd go so far as to use them together...though one
>> certainly could. I'm afraid the article you linked to tangled the wires for
>> me a bit…
>>
>>
>> Heh... That is good for me, it shows that I am in good company:-)
>>
>> As for the pattern below: as I said, it is not clear it works in the
>> first place. What I did understand, though, is that the combination with
>> generators/iterators does make the cycles with promises a bit more readable
>> and also more efficient: as we saw in Bill's code, without ES6, in fact, we
>> have to execute all the promises before really doing anything (even if that
>> may not be visible in the code), ie, we do not gain anything
>> efficiency-wise. When using generators, a yield would explicitly relinquish
>> control, ie, there is indeed more parallelism.
>>
>> However. Here is a thought but with a huge caveat. The caveat is that I
>> am an old man, ie, old-skool, who grew up with antiquated languages like
>> Python and, God forbid, C, so I did not drink of the Kool-Aid of asynchrony
>> everywhere that seems to permeate the usage of Javascript (sorry if I sound
>> cynical). With that out of the way, here is my question: do we *really*
>> need the FindText interface to be async?
>>
>> After all, the reason we have async methods/functions for things like
>> AJAX is because, indeed, these operations are slow compared to the rest, so
>> some level of asynchrony is necessary in a browser. However… isn't it
>> correct that the typical usage of FindText is to search through an already
>> existing DOM tree in the, say, browser, ie, when the full DOM is already in
>> memory? Because if so, then using async (promises or callbacks) sound like
>> an unnecessary complication for the user.
>>
>> So… please, somebody convince me that having simply having a search() and
>> a searchAll(), returning a match or an array of matches, respectively, is
>> not enough… That we have to impose on the users a thorough understanding of
>> promises, generators, etc, thereby reducing our user base significantly due
>> to complexity?
>>
>
> Good point, Ivan. One thing (related) that we need to be careful of is not
> designing this API for the polyfills, but for the browser. The polyfill
> will (likely) be the slower one and in a large enough document it having an
> asynchronous API may make since, but...to your point...an asynchronous API
> may not be what one would expect from a `.search()` or even `.searchAll()`
> (though that seems a bit more reasonable).
>
> Here's a current use case (of sorts). In Chrome, when you "find text"
> highlights appear throughout the document (however large or complex),
> you're taken to the first one, you can "page" (next/prev) through the rest
> of them, and they all appear as lines in the scrollbar. Accessing that
> seems to be what we'd be enabling here.
>
> Chrome does the "find text" process as you type (highlighting within the
> page and the scroll bar with each letter you input). I've no idea if their
> code is asynchronous. I'm guessing that it is (at some level), so that it
> doesn't block user input.
>
> If we imagine re-implementing that experience in a cross-browser JS UI
> built on top of this FindText API, then I think we start to get close to
> what we'd need/want from the underlying FindText API with regards to
> (a)synchronicity.
>
> I could see it either way, but likely asynchronous will be better for
> avoiding blocking the user while the search is done (/me ignores Web
> Workers for the moment ;) ).
>
> Thoughts?
>
>
>> My apologies if this is all just stupid grumbling...
>>
>> Ivan
>>
>> P.S. A nice quote from the blog that you referred to (thanks for it!):
>>
>> "That being said, promises aren't perfect. It's true that they're better
>> than callbacks, but that's a lot like saying that a punch in the gut is
>> better than a kick in the teeth. Sure, one is preferable to the other, but
>> if you had a choice, you'd probably avoid them both.
>>
>> While superior to callbacks, promises are still difficult to understand
>> and error-prone[…]. Novices and experts alike will frequently mess this
>> stuff up, and really, it's not their fault. The problem is that promises,
>> while similar to the patterns we use in synchronous code, are a decent
>> substitute but not quite the same.
>>
>> In truth, you shouldn't have to learn a bunch of arcane rules and new
>> APIs to do things that, in the synchronous world, you can do perfectly well
>> with familiar patterns like return, catch, throw, and for-loops. There
>> shouldn't be two parallel systems that you have to keep straight in your
>> head at all times"
>>
>>
>> It seems that ES7 may make this simpler in future. But we should not
>> specify for ES7; its definition and eventual deployment is way too far down
>> the line.
>>
>>
>>
>>
>>
>> Here's the article that's helped me the most wrt Promises (fwiw):
>> http://pouchdb.com/2015/05/18/we-have-a-problem-with-promises.html
>>
>> Maybe that helps a bit. :)
>>
>> On Wed, Oct 7, 2015 at 7:52 AM, Ivan Herman <ivan@w3.org> wrote:
>>
>>> Well… I tried to understand how generators and promises work together.
>>> It is a bit like a lame leading the blind, in the sense that I do not have
>>> a really really comfortable feeling about Promises in complex situations;
>>> as for generators, I am familiar with them in Python, but the ES6 version
>>> is more complex. I have gone through some of the examples and texts around;
>>> my pattern comes from [1]. Based on the patterns I found in [1], here is a
>>> structure that *may* work with the current interface:
>>>
>>> function runSearch( params, generator ) {
>>>     var iterator = generator(param), ret;
>>>     (function iterate(val){
>>>     // This is where the control goes back to the "runSearch" part
>>>     // and the match result is sent back
>>>         ret = iterator.next(val);
>>>         if( !ret.done ) {
>>>         // ret.value is the Promise set by FindText.search()
>>> // the 'then' part is when the next match is found
>>> // the iteration will get the result back to the 'search' part below
>>>         ret.value.then( function(match) {
>>>         iterate(match.result)
>>>         });
>>>         }
>>>     })();
>>> }
>>>
>>> //====
>>>
>>> params = { ..findtext params...};
>>> runSearch( params, function *search() {
>>> var range;
>>> do {
>>> // Note that the result of the yield is what the corresponding 'next'
>>> sends
>>> // ie, it will be the match result.
>>> // This is also where the async part kicks in, because the .search(),
>>> returning a Promise, leads to it.
>>> match = yield find_text.search();
>>> if( match.result ) {
>>> // do something with match.result
>>> }
>>> } while( match.result );
>>> })
>>>
>>>
>>> I have no idea whether this makes sense, ie, whether that works. But
>>> maybe more importantly: I still think it is very complicated, requires a
>>> thorough understanding of complex things, ie, I am not sure that it would
>>> be the right level of abstraction for the API. And it works with ES6,
>>> although I agree that it may be acceptable for the API to rely on the ES6.
>>>
>>> Just an idea: what about hiding all this to the end user? Isn't it
>>> possible to say that the result of search() is actually an iterator in the
>>> ES6 sense, and it is up to the API implementation to hide all the async
>>> complexity? Or should we keep to one single searchAll() that would return
>>> an iterator and stop there?
>>>
>>> Ivan
>>>
>>> [1] http://davidwalsh.name/async-generators
>>>
>>>
>>>
>>> On 06 Oct 2015, at 17:17 , Benjamin Young <bigbluehat@hypothes.is>
>>> wrote:
>>>
>>> Could ES6 generators be employed here?
>>>
>>> http://www.ecma-international.org/ecma-262/6.0/#sec-generator-function-definitions
>>>
>>> It currently has to be polyfiled, but perhaps the future is not far off.
>>> ;)
>>> http://kangax.github.io/compat-table/es6/#generators
>>>
>>> That could get you something like:
>>> ```
>>> var rf = new FindText({ text: "Rage, rage" });
>>> var result = rf.search()
>>> var next_result = rf.next();
>>> ```
>>>
>>> Which seems to be what one would expect (vs. a promise-based thing).
>>>
>>> `searchAll()` could return a Promise for the purpose of asynchronous
>>> code and avoiding callbacks.
>>>
>>> Great start, though, Doug, regardless!
>>>
>>>
>>>
>>> On Tue, Oct 6, 2015 at 11:04 AM, Ivan Herman <ivan@w3.org> wrote:
>>>
>>>> Ah yes! This recursive construction to stack up promises is the
>>>> solution I indeed saw (it may have been in one of the blogs of Jake
>>>> Archibald) and I already forgot; and it always makes me understand it again
>>>> and again:-)
>>>>
>>>> You made me realize that using search() like that gives a fake
>>>> impression of performance gain without being one, right? As you say, in
>>>> fact all the promises can return with success when the last search() has
>>>> also been executed; ie, performance wise, we do not really gain anything
>>>> compared to a searchAll(). Would it mean that we should not use search() at
>>>> all?
>>>>
>>>> Thanks
>>>>
>>>> Ivan
>>>>
>>>>
>>>>
>>>> On 06 Oct 2015, at 16:54 , Bill Hunt <bill@opengovfoundation.org>
>>>> wrote:
>>>>
>>>> Hi Ivan,
>>>>
>>>> Those are actually the precise concerns I brought up to Doug yesterday,
>>>> and agree that searchAll() is a fine solution.  I also proposed that the
>>>> function could take a "limit" parameter, to only get N results instead of
>>>> all.  This makes promises much easier.  Here's the body of my original
>>>> message that illustrates the point in detail.
>>>>
>>>> Cheers,
>>>> -Bill
>>>>
>>>>
>>>>
>>>>
>>>> Here's Example 1, as-is, with promises, to get all until it can't find
>>>> any more results:
>>>>
>>>> var results = [];
>>>> var recurseSearch = function(rf, results) {
>>>>     var allDonePromise = new Promise();
>>>>
>>>>     var searchPromise = rf.search();
>>>>     searchPromise.then(
>>>>         function(matchData) {
>>>>             if(matchData) {
>>>>                 results.push(matchData);
>>>>                 // Found results, so continue searching.
>>>>
>>>>                 // Aggregate our new promise into our collection of
>>>> promises.
>>>>                 // Add our previously-created promise here.
>>>>                 // * Note 1
>>>>                 var allDonePromise = Promise.all([allDonePromise,
>>>> recurseSearch(text, results)]);
>>>>             }
>>>>             else {
>>>>                 allDonePromise.resolve(matchData);
>>>>             }
>>>>         },
>>>>         function(error) {
>>>>             allDonePromise.reject('There was a problem getting
>>>> results');
>>>>         }
>>>>
>>>>     return allDonePromise;
>>>> }
>>>>
>>>>
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> recurseSearch(rf).then(function(results) {
>>>> console.log(results);
>>>> });
>>>>
>>>>
>>>> * Note 1
>>>> Our promise collection looks odd here.  You've got a promise object
>>>> that looks like a lopsided tree:
>>>>
>>>> [ Promise 1,
>>>>     [ Promise 2,
>>>>         [Promise 3,
>>>>             [Promise 4,
>>>>                 etc...
>>>>             ]
>>>>         ]
>>>>     ]
>>>> ]
>>>>
>>>> Which will eventually resolve itself.  Not exactly performant, or
>>>> readable.
>>>>
>>>> ...
>>>>
>>>> The problem, briefly, is that you end up with recursion when you try to
>>>> find all:
>>>>
>>>> Search 1 ->  (returns S1.promise)
>>>> Search 2 -> (appends S2.promise to S1.promise)
>>>> Search 3 -> (appends S3.promise to S1.promise and S2.promise)
>>>> done, resolve S1.promise && S2.promise && S3.promise altogether.
>>>>
>>>> You cannot simply chain promises here in the normal fashion
>>>> (.then().then().then() etc) because we do not know how many promises we'll
>>>> end up with in the end. We have no idea how deep the thread goes, we must
>>>> simply wait for the last one to return the whole stack of promises.  That
>>>> is effectively, the *first* promise is not resolved until the *last* search
>>>> is done.
>>>>
>>>> Instead, in each step we must return a promise, which is added to the
>>>> chain of promises to be resolve all at once.   This is kind of messy.  This
>>>> also can lead users to make basic mistakes such as this one (the
>>>> Promise.all method collects other promises into a single new promise that
>>>> resolves when all are done) :
>>>>
>>>> var promise = Promise.all(
>>>> rf.search(),
>>>> rf.search(),
>>>> rf.search()
>>>> ).then(function( results ) {
>>>> console.log(results);
>>>> });
>>>>
>>>> Where they will think they're getting the first three results, when in
>>>> fact they will receive three copies of the first result, because they
>>>> happen simultaneously.
>>>>
>>>>
>>>> The simple solution is have a searchAll() method, that returns a
>>>> promise that gets all results.  A great addition to this is to provide a
>>>> limit argument, which only finds the first N results and then returns.
>>>> Those three options (find one, find all, find N) should account for the
>>>> majority of use cases nicely, and will provide a single familiar interface
>>>> for users.  Given that, Example 1 becomes much nicer:
>>>>
>>>>
>>>> Without promises, get the third (original example):
>>>>
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var result = rf.search(); // result is 1st instance of string
>>>>     result = rf.search(); // result is 2nd instance of string
>>>>     result = rf.search(); // result is 3rd instance of string, the
>>>> target instance
>>>>
>>>> get all:
>>>>
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var results = [];
>>>> while( var result = rf.search() ) {
>>>> results.push(result);
>>>> }
>>>>
>>>> get 3:
>>>>
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var results = [];
>>>> results.push( rf.search() ); // result is 1st instance of string
>>>> results.push( rf.search() ); // result is 2nd instance of string
>>>> results.push( rf.search() ); // result is 3rd instance of string
>>>>
>>>>
>>>>
>>>> With promises and searchAll, get the third:
>>>>
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var promise = rf.searchAll(3);
>>>> promise.then( function( results ) {
>>>> console.log( results[2] );
>>>> } );
>>>>
>>>> get all:
>>>>
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var promise = rf.searchAll();
>>>> promise.then( function(results) {
>>>> console.log(results);
>>>> });
>>>>
>>>> get 3:
>>>>
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var promise = rf.searchAll(3);
>>>> promise.then( function( results ) {
>>>> console.log( results );
>>>> } );
>>>>
>>>> Much cleaner than my previous example, obviously!  Here's a good
>>>> description of promises that shows how they should be used, and covers the
>>>> philosophy a bit better than most tutorials:
>>>>
>>>> https://blog.domenic.me/youre-missing-the-point-of-promises/
>>>>
>>>>
>>>> Bill Hunt
>>>> Senior Developer
>>>> OpenGov Foundation
>>>> http://opengovfoundation.org/
>>>>
>>>> Ph: 20-BILL-HUNT
>>>>        202 455 4868
>>>> bill@opengovfoundation.org
>>>>
>>>> On Oct 6, 2015, at 10:47 AM, Ivan Herman <ivan@w3.org> wrote:
>>>>
>>>> Hey Doug,
>>>>
>>>> After a first read, I have two questions/comments.
>>>>
>>>> - (This is minor:) the idea of using an edit distance for suffix/prefix
>>>> is great. However: the way you specify the (maximal) edit distance is
>>>> through a number, ie, the number of editing steps. However, shouldn't this
>>>> edit distance limit be expressed (or at least alternatively express)
>>>> through a percentage of the editing distance over the size of the
>>>> suffix/prefix? I mean: if the suffix is 4 characters long, then an edit
>>>> distance of 3 is significant, whereas the same distance is insignificant if
>>>> the suffix is 100 characters long. Would a percentage be a good alternative?
>>>>
>>>> - (This may be major, but may simply be a result of my own ignorance:)
>>>> I have read about, and actually used in a simple setting, Promises, but
>>>> they still twist my mind, I must admit. One thing that seems to be fairly
>>>> complex when using Promises is when one has to create cycles using them,
>>>> primarily when the number of steps in the cycle is unknown in advance. On
>>>> the other hand, using the search() method in the current spec would require
>>>> exactly that: you do some sort of an iterative go through the search
>>>> results. Maybe there is an easy way to express that with promises which I
>>>> simply do not know, but if this really is complex then what this tells me
>>>> is that the searchAll() might become the method of choice (and one could
>>>> then run a traditional cycle on the results). There are, obviously,
>>>> performance issues, though.
>>>>
>>>> B.t.w., I believe that the example:
>>>>
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var result = rf.search(); // result is 1st instance of string
>>>>    result = rf.search(); // result is 2nd instance of string
>>>>    result = rf.search(); // result is 3rd instance of string, the
>>>> target instance
>>>>
>>>> would not work, exactly for this reason. Each rf.search() returns a
>>>> Promise, ie, one has to use a rf.search().when(function{…}) pattern for
>>>> each entry, and it is not clear in my mind how the iteration materializes
>>>> in the code.
>>>>
>>>> Apologies if I am completely wrong in terms of these Promises...
>>>>
>>>>
>>>> Cheers
>>>>
>>>> Ivan
>>>>
>>>>
>>>> On 05 Oct 2015, at 21:03 , Doug Schepers <schepers@w3.org> wrote:
>>>>
>>>> Hi, folks–
>>>>
>>>> This weekend, I made substantial changes to the FindText API [1]
>>>> (formerly called the RangeFinder API).
>>>>
>>>> I improved the internationalization aspects and options, based on
>>>> feedback from the I18n WG and from their updated CharMod spec (Character
>>>> Model for the World Wide Web: String Matching and Searching… which seems
>>>> tailor-made for us!).
>>>>
>>>> I also fleshed out the algorithm for search (though it still needs lots
>>>> of work), which was one of two critical changes needed before FPWD.
>>>>
>>>> The remaining critical change is for me to update the examples, which
>>>> is important because those will shape many people's first impressions of
>>>> the spec (because examples are easy to read and understand). This is my
>>>> plan for the rest of the day. This involves describing the workflow in
>>>> terms of Promises, which I'm sad to admit I've never used in running before.
>>>>
>>>> Luckily, I have two meetings set up for this afternoon with folks to
>>>> help me with that:
>>>>
>>>> * Chris Birk and Bill Hunt, from OpenGov Foundation
>>>> * Alexander Schmidtz, from jQuery
>>>>
>>>> These guys are very familiar with Promises, and so my examples and API
>>>> design will have at least a bit of vetting and validation before pushing
>>>> FPWD. There will always be room for improvements, but we should be ready to
>>>> go by tomorrow.
>>>>
>>>>
>>>> I welcome feedback from any of you on this spec!
>>>>
>>>>
>>>> [1] http://w3c.github.io/findtext/
>>>> [2] http://w3c.github.io/charmod-norm/
>>>>
>>>> Regards–
>>>> –Doug
>>>>
>>>>
>>>>
>>>> ----
>>>> Ivan Herman, W3C
>>>> Digital Publishing Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ----
>>>> Ivan Herman, W3C
>>>> Digital Publishing Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>
>>>
>>>
>>>
>>>
>>
>>
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>
>>
>>
>>
>>
Received on Friday, 9 October 2015 19:08:03 UTC