Re: FindText API Updated Editor's Draft from Doug Schepers on 2015-10-06 (public-annotation@w3.org from October 2015)

From: Doug Schepers <schepers@w3.org>
Date: Tue, 6 Oct 2015 11:36:37 -0400
To: Benjamin Young <bigbluehat@hypothes.is>, Ivan Herman <ivan@w3.org>
Cc: Bill Hunt <bill@opengovfoundation.org>, W3C Public Annotation List <public-annotation@w3.org>
Message-ID: <5613EA85.4040406@w3.org>
Hi, Benjamin–

On 10/6/15 11:17 AM, Benjamin Young wrote:
> Could ES6 generators be employed here?
> http://www.ecma-international.org/ecma-262/6.0/#sec-generator-function-definitions
>
> It currently has to be polyfiled, but perhaps the future is not far off. ;)
> http://kangax.github.io/compat-table/es6/#generators
>
> That could get you something like:
> ```
> var rf = new FindText({ text: "Rage, rage" });
> var result = rf.search()
> var next_result = rf.next();
> ```
>
> Which seems to be what one would expect (vs. a promise-based thing).

Alex Schmidtz had very similar feedback earlier today. I asked him to 
file an issue on Github, so we can talk about it there.

I think this is very reasonable, and I'd expect others to agree. I don't 
want to hold up FPWD to resolve this, but assuming we can get consensus 
on this quickly (and the timing works out), I'd expect something like 
this to be in the next drafts of the spec.


> `searchAll()` could return a Promise for the purpose of asynchronous
> code and avoiding callbacks.

Yeah.


> Great start, though, Doug, regardless!

Thanks!


This has probably been the hardest spec I've ever worked on, and it was 
well outside my comfort zone, so it feels good to even get this far. 
Even thought the spec is still in shoddy shape, enough of it is now 
fleshed out that the incremental changes are much easier to do now. Even 
more dramatic changes (like the one I suggest at the end of the spec) 
should be relatively painless, since we've got a foundation to talk about.

Regards–
–Doug


> On Tue, Oct 6, 2015 at 11:04 AM, Ivan Herman <ivan@w3.org
> <mailto:ivan@w3.org>> wrote:
>
>     Ah yes! This recursive construction to stack up promises is the
>     solution I indeed saw (it may have been in one of the blogs of Jake
>     Archibald) and I already forgot; and it always makes me understand
>     it again and again:-)
>
>     You made me realize that using search() like that gives a fake
>     impression of performance gain without being one, right? As you say,
>     in fact all the promises can return with success when the last
>     search() has also been executed; ie, performance wise, we do not
>     really gain anything compared to a searchAll(). Would it mean that
>     we should not use search() at all?
>
>     Thanks
>
>     Ivan
>
>
>
>>     On 06 Oct 2015, at 16:54 , Bill Hunt <bill@opengovfoundation.org
>>     <mailto:bill@opengovfoundation.org>> wrote:
>>
>>     Hi Ivan,
>>
>>     Those are actually the precise concerns I brought up to Doug
>>     yesterday, and agree that searchAll() is a fine solution.  I also
>>     proposed that the function could take a "limit" parameter, to only
>>     get N results instead of all.  This makes promises much easier.
>>     Here's the body of my original message that illustrates the point
>>     in detail.
>>
>>     Cheers,
>>     -Bill
>>
>>
>>
>>
>>     Here's Example 1, as-is, with promises, to get all until it can't
>>     find any more results:
>>
>>     var results = [];
>>     var recurseSearch = function(rf, results) {
>>         var allDonePromise = new Promise();
>>
>>         var searchPromise = rf.search();
>>         searchPromise.then(
>>             function(matchData) {
>>                 if(matchData) {
>>                     results.push(matchData);
>>                     // Found results, so continue searching.
>>
>>                     // Aggregate our new promise into our collection
>>     of promises.
>>                     // Add our previously-created promise here.
>>                     // * Note 1
>>                     var allDonePromise = Promise.all([allDonePromise,
>>     recurseSearch(text, results)]);
>>                 }
>>                 else {
>>                     allDonePromise.resolve(matchData);
>>                 }
>>             },
>>             function(error) {
>>                 allDonePromise.reject('There was a problem getting
>>     results');
>>             }
>>
>>         return allDonePromise;
>>     }
>>
>>
>>     var rf = new FindText({ text: "Rage, rage" });
>>     recurseSearch(rf).then(function(results) {
>>     console.log(results);
>>     });
>>
>>
>>     * Note 1
>>     Our promise collection looks odd here.  You've got a promise
>>     object that looks like a lopsided tree:
>>
>>     [ Promise 1,
>>         [ Promise 2,
>>             [Promise 3,
>>                 [Promise 4,
>>                     etc...
>>                 ]
>>             ]
>>         ]
>>     ]
>>
>>     Which will eventually resolve itself.  Not exactly performant, or
>>     readable.
>>
>>     ...
>>
>>     The problem, briefly, is that you end up with recursion when you
>>     try to find all:
>>
>>     Search 1 ->  (returns S1.promise)
>>     Search 2 -> (appends S2.promise to S1.promise)
>>     Search 3 -> (appends S3.promise to S1.promise and S2.promise)
>>     done, resolve S1.promise && S2.promise && S3.promise altogether.
>>
>>     You cannot simply chain promises here in the normal fashion
>>     (.then().then().then() etc) because we do not know how many
>>     promises we'll end up with in the end. We have no idea how deep
>>     the thread goes, we must simply wait for the last one to return
>>     the whole stack of promises.  That is effectively, the *first*
>>     promise is not resolved until the *last* search is done.
>>
>>     Instead, in each step we must return a promise, which is added to
>>     the chain of promises to be resolve all at once.   This is kind of
>>     messy.  This also can lead users to make basic mistakes such as
>>     this one (the Promise.all method collects other promises into a
>>     single new promise that resolves when all are done) :
>>
>>     var promise = Promise.all(
>>     rf.search(),
>>     rf.search(),
>>     rf.search()
>>     ).then(function( results ) {
>>     console.log(results);
>>     });
>>
>>     Where they will think they're getting the first three results,
>>     when in fact they will receive three copies of the first result,
>>     because they happen simultaneously.
>>
>>
>>     The simple solution is have a searchAll() method, that returns a
>>     promise that gets all results.  A great addition to this is to
>>     provide a limit argument, which only finds the first N results and
>>     then returns.  Those three options (find one, find all, find N)
>>     should account for the majority of use cases nicely, and will
>>     provide a single familiar interface for users.  Given that,
>>     Example 1 becomes much nicer:
>>
>>
>>     Without promises, get the third (original example):
>>
>>     var rf = new FindText({ text: "Rage, rage" });
>>     var result = rf.search(); // result is 1st instance of string
>>         result = rf.search(); // result is 2nd instance of string
>>         result = rf.search(); // result is 3rd instance of string, the
>>     target instance
>>
>>     get all:
>>
>>     var rf = new FindText({ text: "Rage, rage" });
>>     var results = [];
>>     while( var result = rf.search() ) {
>>     results.push(result);
>>     }
>>
>>     get 3:
>>
>>     var rf = new FindText({ text: "Rage, rage" });
>>     var results = [];
>>     results.push( rf.search() ); // result is 1st instance of string
>>     results.push( rf.search() ); // result is 2nd instance of string
>>     results.push( rf.search() ); // result is 3rd instance of string
>>
>>
>>
>>     With promises and searchAll, get the third:
>>
>>     var rf = new FindText({ text: "Rage, rage" });
>>     var promise = rf.searchAll(3);
>>     promise.then( function( results ) {
>>     console.log( results[2] );
>>     } );
>>
>>     get all:
>>
>>     var rf = new FindText({ text: "Rage, rage" });
>>     var promise = rf.searchAll();
>>     promise.then( function(results) {
>>     console.log(results);
>>     });
>>
>>     get 3:
>>
>>     var rf = new FindText({ text: "Rage, rage" });
>>     var promise = rf.searchAll(3);
>>     promise.then( function( results ) {
>>     console.log( results );
>>     } );
>>
>>     Much cleaner than my previous example, obviously!  Here's a good
>>     description of promises that shows how they should be used, and
>>     covers the philosophy a bit better than most tutorials:
>>
>>     https://blog.domenic.me/youre-missing-the-point-of-promises/
>>
>>
>>     Bill Hunt
>>     Senior Developer
>>     OpenGov Foundation
>>     http://opengovfoundation.org/
>>
>>     Ph: 20-BILL-HUNT
>>     202 455 4868 <tel:202%20455%204868>
>>     bill@opengovfoundation.org <mailto:bill@opengovfoundation.org>
>>
>>     On Oct 6, 2015, at 10:47 AM, Ivan Herman <ivan@w3.org
>>     <mailto:ivan@w3.org>> wrote:
>>
>>>     Hey Doug,
>>>
>>>     After a first read, I have two questions/comments.
>>>
>>>     - (This is minor:) the idea of using an edit distance for
>>>     suffix/prefix is great. However: the way you specify the
>>>     (maximal) edit distance is through a number, ie, the number of
>>>     editing steps. However, shouldn't this edit distance limit be
>>>     expressed (or at least alternatively express) through a
>>>     percentage of the editing distance over the size of the
>>>     suffix/prefix? I mean: if the suffix is 4 characters long, then
>>>     an edit distance of 3 is significant, whereas the same distance
>>>     is insignificant if the suffix is 100 characters long. Would a
>>>     percentage be a good alternative?
>>>
>>>     - (This may be major, but may simply be a result of my own
>>>     ignorance:) I have read about, and actually used in a simple
>>>     setting, Promises, but they still twist my mind, I must admit.
>>>     One thing that seems to be fairly complex when using Promises is
>>>     when one has to create cycles using them, primarily when the
>>>     number of steps in the cycle is unknown in advance. On the other
>>>     hand, using the search() method in the current spec would require
>>>     exactly that: you do some sort of an iterative go through the
>>>     search results. Maybe there is an easy way to express that with
>>>     promises which I simply do not know, but if this really is
>>>     complex then what this tells me is that the searchAll() might
>>>     become the method of choice (and one could then run a traditional
>>>     cycle on the results). There are, obviously, performance issues,
>>>     though.
>>>
>>>     B.t.w., I believe that the example:
>>>
>>>     var rf = new FindText({ text: "Rage, rage" });
>>>     var result = rf.search(); // result is 1st instance of string
>>>        result = rf.search(); // result is 2nd instance of string
>>>        result = rf.search(); // result is 3rd instance of string, the
>>>     target instance
>>>
>>>     would not work, exactly for this reason. Each rf.search() returns
>>>     a Promise, ie, one has to use a rf.search().when(function{…})
>>>     pattern for each entry, and it is not clear in my mind how the
>>>     iteration materializes in the code.
>>>
>>>     Apologies if I am completely wrong in terms of these Promises...
>>>
>>>
>>>     Cheers
>>>
>>>     Ivan
>>>
>>>
>>>>     On 05 Oct 2015, at 21:03 , Doug Schepers <schepers@w3.org
>>>>     <mailto:schepers@w3.org>> wrote:
>>>>
>>>>     Hi, folks–
>>>>
>>>>     This weekend, I made substantial changes to the FindText API [1]
>>>>     (formerly called the RangeFinder API).
>>>>
>>>>     I improved the internationalization aspects and options, based
>>>>     on feedback from the I18n WG and from their updated CharMod spec
>>>>     (Character Model for the World Wide Web: String Matching and
>>>>     Searching… which seems tailor-made for us!).
>>>>
>>>>     I also fleshed out the algorithm for search (though it still
>>>>     needs lots of work), which was one of two critical changes
>>>>     needed before FPWD.
>>>>
>>>>     The remaining critical change is for me to update the examples,
>>>>     which is important because those will shape many people's first
>>>>     impressions of the spec (because examples are easy to read and
>>>>     understand). This is my plan for the rest of the day. This
>>>>     involves describing the workflow in terms of Promises, which I'm
>>>>     sad to admit I've never used in running before.
>>>>
>>>>     Luckily, I have two meetings set up for this afternoon with
>>>>     folks to help me with that:
>>>>
>>>>     * Chris Birk and Bill Hunt, from OpenGov Foundation
>>>>     * Alexander Schmidtz, from jQuery
>>>>
>>>>     These guys are very familiar with Promises, and so my examples
>>>>     and API design will have at least a bit of vetting and
>>>>     validation before pushing FPWD. There will always be room for
>>>>     improvements, but we should be ready to go by tomorrow.
>>>>
>>>>
>>>>     I welcome feedback from any of you on this spec!
>>>>
>>>>
>>>>     [1] http://w3c.github.io/findtext/
>>>>     [2] http://w3c.github.io/charmod-norm/
>>>>
>>>>     Regards–
>>>>     –Doug
>>>>
>>>
>>>
>>>     ----
>>>     Ivan Herman, W3C
>>>     Digital Publishing Lead
>>>     Home: http://www.w3.org/People/Ivan/
>>>     mobile: +31-641044153 <tel:%2B31-641044153>
>>>     ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>
>>>
>>>
>>>
>>
>
>
>     ----
>     Ivan Herman, W3C
>     Digital Publishing Lead
>     Home: http://www.w3.org/People/Ivan/
>     mobile: +31-641044153 <tel:%2B31-641044153>
>     ORCID ID: http://orcid.org/0000-0003-0782-2704
>
>
>
>
>
Received on Tuesday, 6 October 2015 15:36:42 UTC