Re: FindText API Updated Editor's Draft from Benjamin Young on 2015-10-06 (public-annotation@w3.org from October 2015)

From: Benjamin Young <bigbluehat@hypothes.is>
Date: Tue, 6 Oct 2015 11:17:38 -0400
To: Ivan Herman <ivan@w3.org>
Cc: Bill Hunt <bill@opengovfoundation.org>, Doug Schepers <schepers@w3.org>, W3C Public Annotation List <public-annotation@w3.org>
Message-ID: <CAE3H5FKtzj=TsUw0Zx1DnR_MjE3y8JfJ1CxEoy-jRYtQ3WN0fw@mail.gmail.com>
Could ES6 generators be employed here?
http://www.ecma-international.org/ecma-262/6.0/#sec-generator-function-definitions

It currently has to be polyfiled, but perhaps the future is not far off. ;)
http://kangax.github.io/compat-table/es6/#generators

That could get you something like:
```
var rf = new FindText({ text: "Rage, rage" });
var result = rf.search()
var next_result = rf.next();
```

Which seems to be what one would expect (vs. a promise-based thing).

`searchAll()` could return a Promise for the purpose of asynchronous code
and avoiding callbacks.

Great start, though, Doug, regardless!



On Tue, Oct 6, 2015 at 11:04 AM, Ivan Herman <ivan@w3.org> wrote:

> Ah yes! This recursive construction to stack up promises is the solution I
> indeed saw (it may have been in one of the blogs of Jake Archibald) and I
> already forgot; and it always makes me understand it again and again:-)
>
> You made me realize that using search() like that gives a fake impression
> of performance gain without being one, right? As you say, in fact all the
> promises can return with success when the last search() has also been
> executed; ie, performance wise, we do not really gain anything compared to
> a searchAll(). Would it mean that we should not use search() at all?
>
> Thanks
>
> Ivan
>
>
>
> On 06 Oct 2015, at 16:54 , Bill Hunt <bill@opengovfoundation.org> wrote:
>
> Hi Ivan,
>
> Those are actually the precise concerns I brought up to Doug yesterday,
> and agree that searchAll() is a fine solution.  I also proposed that the
> function could take a "limit" parameter, to only get N results instead of
> all.  This makes promises much easier.  Here's the body of my original
> message that illustrates the point in detail.
>
> Cheers,
> -Bill
>
>
>
>
> Here's Example 1, as-is, with promises, to get all until it can't find any
> more results:
>
> var results = [];
> var recurseSearch = function(rf, results) {
>     var allDonePromise = new Promise();
>
>     var searchPromise = rf.search();
>     searchPromise.then(
>         function(matchData) {
>             if(matchData) {
>                 results.push(matchData);
>                 // Found results, so continue searching.
>
>                 // Aggregate our new promise into our collection of
> promises.
>                 // Add our previously-created promise here.
>                 // * Note 1
>                 var allDonePromise = Promise.all([allDonePromise,
> recurseSearch(text, results)]);
>             }
>             else {
>                 allDonePromise.resolve(matchData);
>             }
>         },
>         function(error) {
>             allDonePromise.reject('There was a problem getting results');
>         }
>
>     return allDonePromise;
> }
>
>
> var rf = new FindText({ text: "Rage, rage" });
> recurseSearch(rf).then(function(results) {
> console.log(results);
> });
>
>
> * Note 1
> Our promise collection looks odd here.  You've got a promise object that
> looks like a lopsided tree:
>
> [ Promise 1,
>     [ Promise 2,
>         [Promise 3,
>             [Promise 4,
>                 etc...
>             ]
>         ]
>     ]
> ]
>
> Which will eventually resolve itself.  Not exactly performant, or
> readable.
>
> ...
>
> The problem, briefly, is that you end up with recursion when you try to
> find all:
>
> Search 1 ->  (returns S1.promise)
> Search 2 -> (appends S2.promise to S1.promise)
> Search 3 -> (appends S3.promise to S1.promise and S2.promise)
> done, resolve S1.promise && S2.promise && S3.promise altogether.
>
> You cannot simply chain promises here in the normal fashion
> (.then().then().then() etc) because we do not know how many promises we'll
> end up with in the end. We have no idea how deep the thread goes, we must
> simply wait for the last one to return the whole stack of promises.  That
> is effectively, the *first* promise is not resolved until the *last* search
> is done.
>
> Instead, in each step we must return a promise, which is added to the
> chain of promises to be resolve all at once.   This is kind of messy.  This
> also can lead users to make basic mistakes such as this one (the
> Promise.all method collects other promises into a single new promise that
> resolves when all are done) :
>
> var promise = Promise.all(
> rf.search(),
> rf.search(),
> rf.search()
> ).then(function( results ) {
> console.log(results);
> });
>
> Where they will think they're getting the first three results, when in
> fact they will receive three copies of the first result, because they
> happen simultaneously.
>
>
> The simple solution is have a searchAll() method, that returns a promise
> that gets all results.  A great addition to this is to provide a limit
> argument, which only finds the first N results and then returns.  Those
> three options (find one, find all, find N) should account for the majority
> of use cases nicely, and will provide a single familiar interface for
> users.  Given that, Example 1 becomes much nicer:
>
>
> Without promises, get the third (original example):
>
> var rf = new FindText({ text: "Rage, rage" });
> var result = rf.search(); // result is 1st instance of string
>     result = rf.search(); // result is 2nd instance of string
>     result = rf.search(); // result is 3rd instance of string, the target
> instance
>
> get all:
>
> var rf = new FindText({ text: "Rage, rage" });
> var results = [];
> while( var result = rf.search() ) {
> results.push(result);
> }
>
> get 3:
>
> var rf = new FindText({ text: "Rage, rage" });
> var results = [];
> results.push( rf.search() ); // result is 1st instance of string
> results.push( rf.search() ); // result is 2nd instance of string
> results.push( rf.search() ); // result is 3rd instance of string
>
>
>
> With promises and searchAll, get the third:
>
> var rf = new FindText({ text: "Rage, rage" });
> var promise = rf.searchAll(3);
> promise.then( function( results ) {
> console.log( results[2] );
> } );
>
> get all:
>
> var rf = new FindText({ text: "Rage, rage" });
> var promise = rf.searchAll();
> promise.then( function(results) {
> console.log(results);
> });
>
> get 3:
>
> var rf = new FindText({ text: "Rage, rage" });
> var promise = rf.searchAll(3);
> promise.then( function( results ) {
> console.log( results );
> } );
>
> Much cleaner than my previous example, obviously!  Here's a good
> description of promises that shows how they should be used, and covers the
> philosophy a bit better than most tutorials:
>
> https://blog.domenic.me/youre-missing-the-point-of-promises/
>
>
> Bill Hunt
> Senior Developer
> OpenGov Foundation
> http://opengovfoundation.org/
>
> Ph: 20-BILL-HUNT
>        202 455 4868
> bill@opengovfoundation.org
>
> On Oct 6, 2015, at 10:47 AM, Ivan Herman <ivan@w3.org> wrote:
>
> Hey Doug,
>
> After a first read, I have two questions/comments.
>
> - (This is minor:) the idea of using an edit distance for suffix/prefix is
> great. However: the way you specify the (maximal) edit distance is through
> a number, ie, the number of editing steps. However, shouldn't this edit
> distance limit be expressed (or at least alternatively express) through a
> percentage of the editing distance over the size of the suffix/prefix? I
> mean: if the suffix is 4 characters long, then an edit distance of 3 is
> significant, whereas the same distance is insignificant if the suffix is
> 100 characters long. Would a percentage be a good alternative?
>
> - (This may be major, but may simply be a result of my own ignorance:) I
> have read about, and actually used in a simple setting, Promises, but they
> still twist my mind, I must admit. One thing that seems to be fairly
> complex when using Promises is when one has to create cycles using them,
> primarily when the number of steps in the cycle is unknown in advance. On
> the other hand, using the search() method in the current spec would require
> exactly that: you do some sort of an iterative go through the search
> results. Maybe there is an easy way to express that with promises which I
> simply do not know, but if this really is complex then what this tells me
> is that the searchAll() might become the method of choice (and one could
> then run a traditional cycle on the results). There are, obviously,
> performance issues, though.
>
> B.t.w., I believe that the example:
>
> var rf = new FindText({ text: "Rage, rage" });
> var result = rf.search(); // result is 1st instance of string
>    result = rf.search(); // result is 2nd instance of string
>    result = rf.search(); // result is 3rd instance of string, the target
> instance
>
> would not work, exactly for this reason. Each rf.search() returns a
> Promise, ie, one has to use a rf.search().when(function{…}) pattern for
> each entry, and it is not clear in my mind how the iteration materializes
> in the code.
>
> Apologies if I am completely wrong in terms of these Promises...
>
>
> Cheers
>
> Ivan
>
>
> On 05 Oct 2015, at 21:03 , Doug Schepers <schepers@w3.org> wrote:
>
> Hi, folks–
>
> This weekend, I made substantial changes to the FindText API [1] (formerly
> called the RangeFinder API).
>
> I improved the internationalization aspects and options, based on feedback
> from the I18n WG and from their updated CharMod spec (Character Model for
> the World Wide Web: String Matching and Searching… which seems tailor-made
> for us!).
>
> I also fleshed out the algorithm for search (though it still needs lots of
> work), which was one of two critical changes needed before FPWD.
>
> The remaining critical change is for me to update the examples, which is
> important because those will shape many people's first impressions of the
> spec (because examples are easy to read and understand). This is my plan
> for the rest of the day. This involves describing the workflow in terms of
> Promises, which I'm sad to admit I've never used in running before.
>
> Luckily, I have two meetings set up for this afternoon with folks to help
> me with that:
>
> * Chris Birk and Bill Hunt, from OpenGov Foundation
> * Alexander Schmidtz, from jQuery
>
> These guys are very familiar with Promises, and so my examples and API
> design will have at least a bit of vetting and validation before pushing
> FPWD. There will always be room for improvements, but we should be ready to
> go by tomorrow.
>
>
> I welcome feedback from any of you on this spec!
>
>
> [1] http://w3c.github.io/findtext/
> [2] http://w3c.github.io/charmod-norm/
>
> Regards–
> –Doug
>
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
>
>
>
>
>
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
>
>
>
>
>
Received on Tuesday, 6 October 2015 15:18:10 UTC