Re: FindText API Updated Editor's Draft from Ivan Herman on 2015-10-12 (public-annotation@w3.org from October 2015)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 12 Oct 2015 10:57:30 +0200
To: Randall Leeds <randall@bleeds.info>
Cc: Benjamin Young <bigbluehat@hypothes.is>, Bill Hunt <bill@opengovfoundation.org>, Doug Schepers <schepers@w3.org>, W3C Public Annotation List <public-annotation@w3.org>
Message-Id: <365D7EAB-53FC-4E19-88C1-11102F27E2DD@w3.org>
Randall,

if this works without some hidden issues (I do not see any, so I believe you) it still shows that, from the user's point of view, there seems to be no reason to have both a search() and a searchAll() interface entries. What you have below is equivalent, for the user, of searchAll(), because the return promise will be resolved only when all the matches have been found. Ie, the kind of sequential display that one would imagine doing in an asynchronous manner with the search itself will not happen.

If one wants to really do that sequential update, then I do not see any other way than to go through the generator trick if that works (because then one 'thread' can yield control to another one explicitly; this is what we miss in the current solution). But that approach is pretty complicated for my taste, I must admit (and I am not sure it works).

Ivan


> On 09 Oct 2015, at 21:07 , Randall Leeds <randall@bleeds.info> wrote:
> 
> I'm not able to follow this thread and any issues with .search().
> 
> If you put a wrapper around the search function as Bill has, you can invoke that wrapper again in a promise resolve callback and this isn't recursion, it's more like a coroutine call. The resolution closure is calling the function, not the function itself. You also can aggregate results without aggregating promises, letting each one resolve.
> 
> function searchAll(rf) {
>   return new Promise(function (resolve, reject) {
>     var results = [];
>     function next() {
>       rf.search().then(function (match) {
>         if (match) {
>           results.push(match);
>           next();
>         } else {
>           resolve(results);
>         }
>       }, reject);
>     }
>     next();
>   });
> }
> 
> searchAll(rf).then(function (results) { ... })
> 
> No recursion. No stack of Promises. I don't see a problem with this.
> 
> 
> 
> On Thu, Oct 8, 2015 at 5:51 AM Benjamin Young <bigbluehat@hypothes.is <mailto:bigbluehat@hypothes.is>> wrote:
> On Thu, Oct 8, 2015 at 5:53 AM, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
> 
>> On 07 Oct 2015, at 21:43 , Benjamin Young <bigbluehat@hypothes.is <mailto:bigbluehat@hypothes.is>> wrote:
>> 
>> Yeah...I don't think I'd go so far as to use them together...though one certainly could. I'm afraid the article you linked to tangled the wires for me a bit…
> 
> Heh... That is good for me, it shows that I am in good company:-)
> 
> As for the pattern below: as I said, it is not clear it works in the first place. What I did understand, though, is that the combination with generators/iterators does make the cycles with promises a bit more readable and also more efficient: as we saw in Bill's code, without ES6, in fact, we have to execute all the promises before really doing anything (even if that may not be visible in the code), ie, we do not gain anything efficiency-wise. When using generators, a yield would explicitly relinquish control, ie, there is indeed more parallelism.
> 
> However. Here is a thought but with a huge caveat. The caveat is that I am an old man, ie, old-skool, who grew up with antiquated languages like Python and, God forbid, C, so I did not drink of the Kool-Aid of asynchrony everywhere that seems to permeate the usage of Javascript (sorry if I sound cynical). With that out of the way, here is my question: do we really need the FindText interface to be async?
> 
> After all, the reason we have async methods/functions for things like AJAX is because, indeed, these operations are slow compared to the rest, so some level of asynchrony is necessary in a browser. However… isn't it correct that the typical usage of FindText is to search through an already existing DOM tree in the, say, browser, ie, when the full DOM is already in memory? Because if so, then using async (promises or callbacks) sound like an unnecessary complication for the user.
> 
> So… please, somebody convince me that having simply having a search() and a searchAll(), returning a match or an array of matches, respectively, is not enough… That we have to impose on the users a thorough understanding of promises, generators, etc, thereby reducing our user base significantly due to complexity?
> 
> Good point, Ivan. One thing (related) that we need to be careful of is not designing this API for the polyfills, but for the browser. The polyfill will (likely) be the slower one and in a large enough document it having an asynchronous API may make since, but...to your point...an asynchronous API may not be what one would expect from a `.search()` or even `.searchAll()` (though that seems a bit more reasonable).
> 
> Here's a current use case (of sorts). In Chrome, when you "find text" highlights appear throughout the document (however large or complex), you're taken to the first one, you can "page" (next/prev) through the rest of them, and they all appear as lines in the scrollbar. Accessing that seems to be what we'd be enabling here.
> 
> Chrome does the "find text" process as you type (highlighting within the page and the scroll bar with each letter you input). I've no idea if their code is asynchronous. I'm guessing that it is (at some level), so that it doesn't block user input.
> 
> If we imagine re-implementing that experience in a cross-browser JS UI built on top of this FindText API, then I think we start to get close to what we'd need/want from the underlying FindText API with regards to (a)synchronicity.
> 
> I could see it either way, but likely asynchronous will be better for avoiding blocking the user while the search is done (/me ignores Web Workers for the moment ;) ).
> 
> Thoughts?
> 
> 
> My apologies if this is all just stupid grumbling...
> 
> Ivan
> 
> P.S. A nice quote from the blog that you referred to (thanks for it!):
> 
> "That being said, promises aren't perfect. It's true that they're better than callbacks, but that's a lot like saying that a punch in the gut is better than a kick in the teeth. Sure, one is preferable to the other, but if you had a choice, you'd probably avoid them both.
> 
> While superior to callbacks, promises are still difficult to understand and error-prone[…]. Novices and experts alike will frequently mess this stuff up, and really, it's not their fault. The problem is that promises, while similar to the patterns we use in synchronous code, are a decent substitute but not quite the same.
> 
> In truth, you shouldn't have to learn a bunch of arcane rules and new APIs to do things that, in the synchronous world, you can do perfectly well with familiar patterns like return, catch, throw, and for-loops. There shouldn't be two parallel systems that you have to keep straight in your head at all times"
> 
> It seems that ES7 may make this simpler in future. But we should not specify for ES7; its definition and eventual deployment is way too far down the line.
> 
> 
> 
> 
>> 
>> Here's the article that's helped me the most wrt Promises (fwiw):
>> http://pouchdb.com/2015/05/18/we-have-a-problem-with-promises.html <http://pouchdb.com/2015/05/18/we-have-a-problem-with-promises.html>
>> 
>> Maybe that helps a bit. :)
>> 
>> On Wed, Oct 7, 2015 at 7:52 AM, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>> Well… I tried to understand how generators and promises work together. It is a bit like a lame leading the blind, in the sense that I do not have a really really comfortable feeling about Promises in complex situations; as for generators, I am familiar with them in Python, but the ES6 version is more complex. I have gone through some of the examples and texts around; my pattern comes from [1]. Based on the patterns I found in [1], here is a structure that *may* work with the current interface:
>> 
>> function runSearch( params, generator ) {
>>     var iterator = generator(param), ret;
>>     (function iterate(val){
>>      // This is where the control goes back to the "runSearch" part
>>      // and the match result is sent back
>>         ret = iterator.next(val);
>>         if( !ret.done ) {
>>          // ret.value is the Promise set by FindText.search()
>>   // the 'then' part is when the next match is found
>>   // the iteration will get the result back to the 'search' part below
>>          ret.value.then( function(match) {
>>           iterate(match.result)
>>          });
>>         }
>>     })();
>> }
>> 
>> //====
>> 
>> params = { ..findtext params...};
>> runSearch( params, function *search() {
>>  var range;
>>  do {
>>   // Note that the result of the yield is what the corresponding 'next' sends
>>   // ie, it will be the match result.
>>   // This is also where the async part kicks in, because the .search(), returning a Promise, leads to it.
>>   match = yield find_text.search();
>>   if( match.result ) {
>>    // do something with match.result
>>   }
>>  } while( match.result );
>> })
>> 
>> 
>> I have no idea whether this makes sense, ie, whether that works. But maybe more importantly: I still think it is very complicated, requires a thorough understanding of complex things, ie, I am not sure that it would be the right level of abstraction for the API. And it works with ES6, although I agree that it may be acceptable for the API to rely on the ES6.
>> 
>> Just an idea: what about hiding all this to the end user? Isn't it possible to say that the result of search() is actually an iterator in the ES6 sense, and it is up to the API implementation to hide all the async complexity? Or should we keep to one single searchAll() that would return an iterator and stop there?
>> 
>> Ivan
>> 
>> [1] http://davidwalsh.name/async-generators <http://davidwalsh.name/async-generators>
>> 
>> 
>> 
>>> On 06 Oct 2015, at 17:17 , Benjamin Young <bigbluehat@hypothes.is <mailto:bigbluehat@hypothes.is>> wrote:
>>> 
>>> Could ES6 generators be employed here?
>>> http://www.ecma-international.org/ecma-262/6.0/#sec-generator-function-definitions <http://www.ecma-international.org/ecma-262/6.0/#sec-generator-function-definitions>
>>> 
>>> It currently has to be polyfiled, but perhaps the future is not far off. ;)
>>> http://kangax.github.io/compat-table/es6/#generators <http://kangax.github.io/compat-table/es6/#generators>
>>> 
>>> That could get you something like:
>>> ```
>>> var rf = new FindText({ text: "Rage, rage" });
>>> var result = rf.search()
>>> var next_result = rf.next();
>>> ```
>>> 
>>> Which seems to be what one would expect (vs. a promise-based thing).
>>> 
>>> `searchAll()` could return a Promise for the purpose of asynchronous code and avoiding callbacks.
>>> 
>>> Great start, though, Doug, regardless!
>>> 
>>> 
>>> 
>>> On Tue, Oct 6, 2015 at 11:04 AM, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>> Ah yes! This recursive construction to stack up promises is the solution I indeed saw (it may have been in one of the blogs of Jake Archibald) and I already forgot; and it always makes me understand it again and again:-)
>>> 
>>> You made me realize that using search() like that gives a fake impression of performance gain without being one, right? As you say, in fact all the promises can return with success when the last search() has also been executed; ie, performance wise, we do not really gain anything compared to a searchAll(). Would it mean that we should not use search() at all?
>>> 
>>> Thanks
>>> 
>>> Ivan
>>> 
>>> 
>>> 
>>>> On 06 Oct 2015, at 16:54 , Bill Hunt <bill@opengovfoundation.org <mailto:bill@opengovfoundation.org>> wrote:
>>>> 
>>>> Hi Ivan,
>>>> 
>>>> Those are actually the precise concerns I brought up to Doug yesterday, and agree that searchAll() is a fine solution.  I also proposed that the function could take a "limit" parameter, to only get N results instead of all.  This makes promises much easier.  Here's the body of my original message that illustrates the point in detail.
>>>> 
>>>> Cheers,
>>>> -Bill
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Here's Example 1, as-is, with promises, to get all until it can't find any more results:
>>>> 
>>>> var results = [];
>>>> var recurseSearch = function(rf, results) {
>>>>     var allDonePromise = new Promise();
>>>> 
>>>>     var searchPromise = rf.search();
>>>>     searchPromise.then(
>>>>         function(matchData) {
>>>>             if(matchData) {
>>>>                 results.push(matchData);
>>>>                 // Found results, so continue searching.
>>>> 
>>>>                 // Aggregate our new promise into our collection of promises.
>>>>                 // Add our previously-created promise here.
>>>>                 // * Note 1
>>>>                 var allDonePromise = Promise.all([allDonePromise, recurseSearch(text, results)]);
>>>>             }
>>>>             else {
>>>>                 allDonePromise.resolve(matchData);
>>>>             }
>>>>         },
>>>>         function(error) {
>>>>             allDonePromise.reject('There was a problem getting results');
>>>>         }
>>>> 
>>>>     return allDonePromise;
>>>> }
>>>> 
>>>> 
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> recurseSearch(rf).then(function(results) {
>>>>  console.log(results);
>>>> });
>>>> 
>>>> 
>>>> * Note 1
>>>> Our promise collection looks odd here.  You've got a promise object that looks like a lopsided tree:
>>>> 
>>>> [ Promise 1,
>>>>     [ Promise 2,
>>>>         [Promise 3,
>>>>             [Promise 4,
>>>>                 etc...
>>>>             ]
>>>>         ]
>>>>     ]
>>>> ]
>>>> 
>>>> Which will eventually resolve itself.  Not exactly performant, or readable.
>>>> 
>>>> ...
>>>> 
>>>> The problem, briefly, is that you end up with recursion when you try to find all:
>>>> 
>>>> Search 1 ->  (returns S1.promise)
>>>>  Search 2 -> (appends S2.promise to S1.promise)
>>>>   Search 3 -> (appends S3.promise to S1.promise and S2.promise)
>>>>    done, resolve S1.promise && S2.promise && S3.promise altogether.
>>>> 
>>>> You cannot simply chain promises here in the normal fashion (.then().then().then() etc) because we do not know how many promises we'll end up with in the end. We have no idea how deep the thread goes, we must simply wait for the last one to return the whole stack of promises.  That is effectively, the *first* promise is not resolved until the *last* search is done.
>>>> 
>>>> Instead, in each step we must return a promise, which is added to the chain of promises to be resolve all at once.   This is kind of messy.  This also can lead users to make basic mistakes such as this one (the Promise.all method collects other promises into a single new promise that resolves when all are done) :
>>>> 
>>>> var promise = Promise.all(
>>>>  rf.search(),
>>>>  rf.search(),
>>>>  rf.search()
>>>> ).then(function( results ) {
>>>>  console.log(results);
>>>> });
>>>> 
>>>> Where they will think they're getting the first three results, when in fact they will receive three copies of the first result, because they happen simultaneously.
>>>> 
>>>> 
>>>> The simple solution is have a searchAll() method, that returns a promise that gets all results.  A great addition to this is to provide a limit argument, which only finds the first N results and then returns.  Those three options (find one, find all, find N) should account for the majority of use cases nicely, and will provide a single familiar interface for users.  Given that, Example 1 becomes much nicer:
>>>> 
>>>> 
>>>> Without promises, get the third (original example):
>>>> 
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var result = rf.search(); // result is 1st instance of string
>>>>     result = rf.search(); // result is 2nd instance of string
>>>>     result = rf.search(); // result is 3rd instance of string, the target instance
>>>> 
>>>> get all:
>>>> 
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var results = [];
>>>> while( var result = rf.search() ) {
>>>>  results.push(result);
>>>> }
>>>> 
>>>> get 3:
>>>> 
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var results = [];
>>>> results.push( rf.search() ); // result is 1st instance of string
>>>> results.push( rf.search() ); // result is 2nd instance of string
>>>> results.push( rf.search() ); // result is 3rd instance of string
>>>> 
>>>> 
>>>> 
>>>> With promises and searchAll, get the third:
>>>> 
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var promise = rf.searchAll(3);
>>>> promise.then( function( results ) {
>>>>  console.log( results[2] );
>>>> } );
>>>> 
>>>> get all:
>>>> 
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var promise = rf.searchAll();
>>>> promise.then( function(results) {
>>>>  console.log(results);
>>>> });
>>>> 
>>>> get 3:
>>>> 
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var promise = rf.searchAll(3);
>>>> promise.then( function( results ) {
>>>>  console.log( results );
>>>> } );
>>>> 
>>>> Much cleaner than my previous example, obviously!  Here's a good description of promises that shows how they should be used, and covers the philosophy a bit better than most tutorials:
>>>> 
>>>> https://blog.domenic.me/youre-missing-the-point-of-promises/ <https://blog.domenic.me/youre-missing-the-point-of-promises/>
>>>> 
>>>> 
>>>> Bill Hunt
>>>> Senior Developer
>>>> OpenGov Foundation
>>>> http://opengovfoundation.org/ <http://opengovfoundation.org/>
>>>> 
>>>> Ph: 20-BILL-HUNT
>>>>        202 455 4868 <tel:202%20455%204868>
>>>> bill@opengovfoundation.org <mailto:bill@opengovfoundation.org>
>>>> On Oct 6, 2015, at 10:47 AM, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>>> 
>>>>> Hey Doug,
>>>>> 
>>>>> After a first read, I have two questions/comments.
>>>>> 
>>>>> - (This is minor:) the idea of using an edit distance for suffix/prefix is great. However: the way you specify the (maximal) edit distance is through a number, ie, the number of editing steps. However, shouldn't this edit distance limit be expressed (or at least alternatively express) through a percentage of the editing distance over the size of the suffix/prefix? I mean: if the suffix is 4 characters long, then an edit distance of 3 is significant, whereas the same distance is insignificant if the suffix is 100 characters long. Would a percentage be a good alternative?
>>>>> 
>>>>> - (This may be major, but may simply be a result of my own ignorance:) I have read about, and actually used in a simple setting, Promises, but they still twist my mind, I must admit. One thing that seems to be fairly complex when using Promises is when one has to create cycles using them, primarily when the number of steps in the cycle is unknown in advance. On the other hand, using the search() method in the current spec would require exactly that: you do some sort of an iterative go through the search results. Maybe there is an easy way to express that with promises which I simply do not know, but if this really is complex then what this tells me is that the searchAll() might become the method of choice (and one could then run a traditional cycle on the results). There are, obviously, performance issues, though.
>>>>> 
>>>>> B.t.w., I believe that the example:
>>>>> 
>>>>> var rf = new FindText({ text: "Rage, rage" });
>>>>> var result = rf.search(); // result is 1st instance of string
>>>>>    result = rf.search(); // result is 2nd instance of string
>>>>>    result = rf.search(); // result is 3rd instance of string, the target instance
>>>>> 
>>>>> would not work, exactly for this reason. Each rf.search() returns a Promise, ie, one has to use a rf.search().when(function{…}) pattern for each entry, and it is not clear in my mind how the iteration materializes in the code.
>>>>> 
>>>>> Apologies if I am completely wrong in terms of these Promises...
>>>>> 
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> Ivan
>>>>> 
>>>>> 
>>>>>> On 05 Oct 2015, at 21:03 , Doug Schepers <schepers@w3.org <mailto:schepers@w3.org>> wrote:
>>>>>> 
>>>>>> Hi, folks–
>>>>>> 
>>>>>> This weekend, I made substantial changes to the FindText API [1] (formerly called the RangeFinder API).
>>>>>> 
>>>>>> I improved the internationalization aspects and options, based on feedback from the I18n WG and from their updated CharMod spec (Character Model for the World Wide Web: String Matching and Searching… which seems tailor-made for us!).
>>>>>> 
>>>>>> I also fleshed out the algorithm for search (though it still needs lots of work), which was one of two critical changes needed before FPWD.
>>>>>> 
>>>>>> The remaining critical change is for me to update the examples, which is important because those will shape many people's first impressions of the spec (because examples are easy to read and understand). This is my plan for the rest of the day. This involves describing the workflow in terms of Promises, which I'm sad to admit I've never used in running before.
>>>>>> 
>>>>>> Luckily, I have two meetings set up for this afternoon with folks to help me with that:
>>>>>> 
>>>>>> * Chris Birk and Bill Hunt, from OpenGov Foundation
>>>>>> * Alexander Schmidtz, from jQuery
>>>>>> 
>>>>>> These guys are very familiar with Promises, and so my examples and API design will have at least a bit of vetting and validation before pushing FPWD. There will always be room for improvements, but we should be ready to go by tomorrow.
>>>>>> 
>>>>>> 
>>>>>> I welcome feedback from any of you on this spec!
>>>>>> 
>>>>>> 
>>>>>> [1] http://w3c.github.io/findtext/ <http://w3c.github.io/findtext/>
>>>>>> [2] http://w3c.github.io/charmod-norm/ <http://w3c.github.io/charmod-norm/>
>>>>>> 
>>>>>> Regards–
>>>>>> –Doug
>>>>>> 
>>>>> 
>>>>> 
>>>>> ----
>>>>> Ivan Herman, W3C
>>>>> Digital Publishing Lead
>>>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>>>> mobile: +31-641044153 <tel:%2B31-641044153>
>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Lead
>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>> mobile: +31-641044153 <tel:%2B31-641044153>
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Lead
>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>> mobile: +31-641044153 <tel:%2B31-641044153>
>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
> mobile: +31-641044153 <tel:%2B31-641044153>
> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
> 
> 
> 
> 


----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Monday, 12 October 2015 08:57:49 UTC