Re: FindText API Updated Editor's Draft from Ivan Herman on 2015-10-07 (public-annotation@w3.org from October 2015)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 7 Oct 2015 13:52:41 +0200
To: Benjamin Young <bigbluehat@hypothes.is>
Cc: Bill Hunt <bill@opengovfoundation.org>, Doug Schepers <schepers@w3.org>, W3C Public Annotation List <public-annotation@w3.org>
Message-Id: <A5EEC3D2-94F2-4275-8D4E-02C771CC37E4@w3.org>
Well… I tried to understand how generators and promises work together. It is a bit like a lame leading the blind, in the sense that I do not have a really really comfortable feeling about Promises in complex situations; as for generators, I am familiar with them in Python, but the ES6 version is more complex. I have gone through some of the examples and texts around; my pattern comes from [1]. Based on the patterns I found in [1], here is a structure that *may* work with the current interface:

function runSearch( params, generator ) {
    var iterator = generator(param), ret;
    (function iterate(val){
     // This is where the control goes back to the "runSearch" part
     // and the match result is sent back
        ret = iterator.next(val);
        if( !ret.done ) {
         // ret.value is the Promise set by FindText.search()
  // the 'then' part is when the next match is found
  // the iteration will get the result back to the 'search' part below
         ret.value.then( function(match) {
          iterate(match.result)
         });
        }
    })();
}

//====

params = { ..findtext params...};
runSearch( params, function *search() {
 var range;
 do {
  // Note that the result of the yield is what the corresponding 'next' sends
  // ie, it will be the match result.
  // This is also where the async part kicks in, because the .search(), returning a Promise, leads to it.
  match = yield find_text.search();
  if( match.result ) {
   // do something with match.result
  }
 } while( match.result );
})


I have no idea whether this makes sense, ie, whether that works. But maybe more importantly: I still think it is very complicated, requires a thorough understanding of complex things, ie, I am not sure that it would be the right level of abstraction for the API. And it works with ES6, although I agree that it may be acceptable for the API to rely on the ES6.

Just an idea: what about hiding all this to the end user? Isn't it possible to say that the result of search() is actually an iterator in the ES6 sense, and it is up to the API implementation to hide all the async complexity? Or should we keep to one single searchAll() that would return an iterator and stop there?

Ivan

[1] http://davidwalsh.name/async-generators <http://davidwalsh.name/async-generators>


> On 06 Oct 2015, at 17:17 , Benjamin Young <bigbluehat@hypothes.is> wrote:
> 
> Could ES6 generators be employed here?
> http://www.ecma-international.org/ecma-262/6.0/#sec-generator-function-definitions <http://www.ecma-international.org/ecma-262/6.0/#sec-generator-function-definitions>
> 
> It currently has to be polyfiled, but perhaps the future is not far off. ;)
> http://kangax.github.io/compat-table/es6/#generators <http://kangax.github.io/compat-table/es6/#generators>
> 
> That could get you something like:
> ```
> var rf = new FindText({ text: "Rage, rage" });
> var result = rf.search()
> var next_result = rf.next();
> ```
> 
> Which seems to be what one would expect (vs. a promise-based thing).
> 
> `searchAll()` could return a Promise for the purpose of asynchronous code and avoiding callbacks.
> 
> Great start, though, Doug, regardless!
> 
> 
> 
> On Tue, Oct 6, 2015 at 11:04 AM, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
> Ah yes! This recursive construction to stack up promises is the solution I indeed saw (it may have been in one of the blogs of Jake Archibald) and I already forgot; and it always makes me understand it again and again:-)
> 
> You made me realize that using search() like that gives a fake impression of performance gain without being one, right? As you say, in fact all the promises can return with success when the last search() has also been executed; ie, performance wise, we do not really gain anything compared to a searchAll(). Would it mean that we should not use search() at all?
> 
> Thanks
> 
> Ivan
> 
> 
> 
>> On 06 Oct 2015, at 16:54 , Bill Hunt <bill@opengovfoundation.org <mailto:bill@opengovfoundation.org>> wrote:
>> 
>> Hi Ivan,
>> 
>> Those are actually the precise concerns I brought up to Doug yesterday, and agree that searchAll() is a fine solution.  I also proposed that the function could take a "limit" parameter, to only get N results instead of all.  This makes promises much easier.  Here's the body of my original message that illustrates the point in detail.
>> 
>> Cheers,
>> -Bill
>> 
>> 
>> 
>> 
>> Here's Example 1, as-is, with promises, to get all until it can't find any more results:
>> 
>> var results = [];
>> var recurseSearch = function(rf, results) {
>>     var allDonePromise = new Promise();
>> 
>>     var searchPromise = rf.search();
>>     searchPromise.then(
>>         function(matchData) {
>>             if(matchData) {
>>                 results.push(matchData);
>>                 // Found results, so continue searching.
>> 
>>                 // Aggregate our new promise into our collection of promises.
>>                 // Add our previously-created promise here.
>>                 // * Note 1
>>                 var allDonePromise = Promise.all([allDonePromise, recurseSearch(text, results)]);
>>             }
>>             else {
>>                 allDonePromise.resolve(matchData);
>>             }
>>         },
>>         function(error) {
>>             allDonePromise.reject('There was a problem getting results');
>>         }
>> 
>>     return allDonePromise;
>> }
>> 
>> 
>> var rf = new FindText({ text: "Rage, rage" });
>> recurseSearch(rf).then(function(results) {
>>  console.log(results);
>> });
>> 
>> 
>> * Note 1
>> Our promise collection looks odd here.  You've got a promise object that looks like a lopsided tree:
>> 
>> [ Promise 1,
>>     [ Promise 2,
>>         [Promise 3,
>>             [Promise 4,
>>                 etc...
>>             ]
>>         ]
>>     ]
>> ]
>> 
>> Which will eventually resolve itself.  Not exactly performant, or readable.
>> 
>> ...
>> 
>> The problem, briefly, is that you end up with recursion when you try to find all:
>> 
>> Search 1 ->  (returns S1.promise)
>>  Search 2 -> (appends S2.promise to S1.promise)
>>   Search 3 -> (appends S3.promise to S1.promise and S2.promise)
>>    done, resolve S1.promise && S2.promise && S3.promise altogether.
>> 
>> You cannot simply chain promises here in the normal fashion (.then().then().then() etc) because we do not know how many promises we'll end up with in the end. We have no idea how deep the thread goes, we must simply wait for the last one to return the whole stack of promises.  That is effectively, the *first* promise is not resolved until the *last* search is done.
>> 
>> Instead, in each step we must return a promise, which is added to the chain of promises to be resolve all at once.   This is kind of messy.  This also can lead users to make basic mistakes such as this one (the Promise.all method collects other promises into a single new promise that resolves when all are done) :
>> 
>> var promise = Promise.all(
>>  rf.search(),
>>  rf.search(),
>>  rf.search()
>> ).then(function( results ) {
>>  console.log(results);
>> });
>> 
>> Where they will think they're getting the first three results, when in fact they will receive three copies of the first result, because they happen simultaneously.
>> 
>> 
>> The simple solution is have a searchAll() method, that returns a promise that gets all results.  A great addition to this is to provide a limit argument, which only finds the first N results and then returns.  Those three options (find one, find all, find N) should account for the majority of use cases nicely, and will provide a single familiar interface for users.  Given that, Example 1 becomes much nicer:
>> 
>> 
>> Without promises, get the third (original example):
>> 
>> var rf = new FindText({ text: "Rage, rage" });
>> var result = rf.search(); // result is 1st instance of string
>>     result = rf.search(); // result is 2nd instance of string
>>     result = rf.search(); // result is 3rd instance of string, the target instance
>> 
>> get all:
>> 
>> var rf = new FindText({ text: "Rage, rage" });
>> var results = [];
>> while( var result = rf.search() ) {
>>  results.push(result);
>> }
>> 
>> get 3:
>> 
>> var rf = new FindText({ text: "Rage, rage" });
>> var results = [];
>> results.push( rf.search() ); // result is 1st instance of string
>> results.push( rf.search() ); // result is 2nd instance of string
>> results.push( rf.search() ); // result is 3rd instance of string
>> 
>> 
>> 
>> With promises and searchAll, get the third:
>> 
>> var rf = new FindText({ text: "Rage, rage" });
>> var promise = rf.searchAll(3);
>> promise.then( function( results ) {
>>  console.log( results[2] );
>> } );
>> 
>> get all:
>> 
>> var rf = new FindText({ text: "Rage, rage" });
>> var promise = rf.searchAll();
>> promise.then( function(results) {
>>  console.log(results);
>> });
>> 
>> get 3:
>> 
>> var rf = new FindText({ text: "Rage, rage" });
>> var promise = rf.searchAll(3);
>> promise.then( function( results ) {
>>  console.log( results );
>> } );
>> 
>> Much cleaner than my previous example, obviously!  Here's a good description of promises that shows how they should be used, and covers the philosophy a bit better than most tutorials:
>> 
>> https://blog.domenic.me/youre-missing-the-point-of-promises/ <https://blog.domenic.me/youre-missing-the-point-of-promises/>
>> 
>> 
>> Bill Hunt
>> Senior Developer
>> OpenGov Foundation
>> http://opengovfoundation.org/ <http://opengovfoundation.org/>
>> 
>> Ph: 20-BILL-HUNT
>>        202 455 4868 <tel:202%20455%204868>
>> bill@opengovfoundation.org <mailto:bill@opengovfoundation.org>
>> On Oct 6, 2015, at 10:47 AM, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>> 
>>> Hey Doug,
>>> 
>>> After a first read, I have two questions/comments.
>>> 
>>> - (This is minor:) the idea of using an edit distance for suffix/prefix is great. However: the way you specify the (maximal) edit distance is through a number, ie, the number of editing steps. However, shouldn't this edit distance limit be expressed (or at least alternatively express) through a percentage of the editing distance over the size of the suffix/prefix? I mean: if the suffix is 4 characters long, then an edit distance of 3 is significant, whereas the same distance is insignificant if the suffix is 100 characters long. Would a percentage be a good alternative?
>>> 
>>> - (This may be major, but may simply be a result of my own ignorance:) I have read about, and actually used in a simple setting, Promises, but they still twist my mind, I must admit. One thing that seems to be fairly complex when using Promises is when one has to create cycles using them, primarily when the number of steps in the cycle is unknown in advance. On the other hand, using the search() method in the current spec would require exactly that: you do some sort of an iterative go through the search results. Maybe there is an easy way to express that with promises which I simply do not know, but if this really is complex then what this tells me is that the searchAll() might become the method of choice (and one could then run a traditional cycle on the results). There are, obviously, performance issues, though.
>>> 
>>> B.t.w., I believe that the example:
>>> 
>>> var rf = new FindText({ text: "Rage, rage" });
>>> var result = rf.search(); // result is 1st instance of string
>>>    result = rf.search(); // result is 2nd instance of string
>>>    result = rf.search(); // result is 3rd instance of string, the target instance
>>> 
>>> would not work, exactly for this reason. Each rf.search() returns a Promise, ie, one has to use a rf.search().when(function{…}) pattern for each entry, and it is not clear in my mind how the iteration materializes in the code.
>>> 
>>> Apologies if I am completely wrong in terms of these Promises...
>>> 
>>> 
>>> Cheers
>>> 
>>> Ivan
>>> 
>>> 
>>>> On 05 Oct 2015, at 21:03 , Doug Schepers <schepers@w3.org <mailto:schepers@w3.org>> wrote:
>>>> 
>>>> Hi, folks–
>>>> 
>>>> This weekend, I made substantial changes to the FindText API [1] (formerly called the RangeFinder API).
>>>> 
>>>> I improved the internationalization aspects and options, based on feedback from the I18n WG and from their updated CharMod spec (Character Model for the World Wide Web: String Matching and Searching… which seems tailor-made for us!).
>>>> 
>>>> I also fleshed out the algorithm for search (though it still needs lots of work), which was one of two critical changes needed before FPWD.
>>>> 
>>>> The remaining critical change is for me to update the examples, which is important because those will shape many people's first impressions of the spec (because examples are easy to read and understand). This is my plan for the rest of the day. This involves describing the workflow in terms of Promises, which I'm sad to admit I've never used in running before.
>>>> 
>>>> Luckily, I have two meetings set up for this afternoon with folks to help me with that:
>>>> 
>>>> * Chris Birk and Bill Hunt, from OpenGov Foundation
>>>> * Alexander Schmidtz, from jQuery
>>>> 
>>>> These guys are very familiar with Promises, and so my examples and API design will have at least a bit of vetting and validation before pushing FPWD. There will always be room for improvements, but we should be ready to go by tomorrow.
>>>> 
>>>> 
>>>> I welcome feedback from any of you on this spec!
>>>> 
>>>> 
>>>> [1] http://w3c.github.io/findtext/ <http://w3c.github.io/findtext/>
>>>> [2] http://w3c.github.io/charmod-norm/ <http://w3c.github.io/charmod-norm/>
>>>> 
>>>> Regards–
>>>> –Doug
>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Lead
>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>> mobile: +31-641044153 <tel:%2B31-641044153>
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>> 
>>> 
>>> 
>>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
> mobile: +31-641044153 <tel:%2B31-641044153>
> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
> 
> 
> 
> 
> 


----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Wednesday, 7 October 2015 11:52:53 UTC