Re: FindText API Updated Editor's Draft from Bill Hunt on 2015-10-06 (public-annotation@w3.org from October 2015)

From: Bill Hunt <bill@opengovfoundation.org>
Date: Tue, 6 Oct 2015 10:54:35 -0400
To: Ivan Herman <ivan@w3.org>
Cc: Doug Schepers <schepers@w3.org>, W3C Public Annotation List <public-annotation@w3.org>
Message-Id: <7A031BE6-353F-490B-A5D2-D9A2D7432E02@opengovfoundation.org>
Hi Ivan, 

Those are actually the precise concerns I brought up to Doug yesterday, and agree that searchAll() is a fine solution.  I also proposed that the function could take a "limit" parameter, to only get N results instead of all.  This makes promises much easier.  Here's the body of my original message that illustrates the point in detail.

Cheers,
-Bill




Here's Example 1, as-is, with promises, to get all until it can't find any more results:

var results = [];
var recurseSearch = function(rf, results) {
    var allDonePromise = new Promise();

    var searchPromise = rf.search();
    searchPromise.then(
        function(matchData) {
            if(matchData) {
                results.push(matchData);
                // Found results, so continue searching.

                // Aggregate our new promise into our collection of promises.
                // Add our previously-created promise here.
                // * Note 1
                var allDonePromise = Promise.all([allDonePromise, recurseSearch(text, results)]);
            }
            else {
                allDonePromise.resolve(matchData);
            }
        },
        function(error) {
            allDonePromise.reject('There was a problem getting results');
        }

    return allDonePromise;
}


var rf = new FindText({ text: "Rage, rage" });
recurseSearch(rf).then(function(results) {
	console.log(results);
});


* Note 1
Our promise collection looks odd here.  You've got a promise object that looks like a lopsided tree: 

[ Promise 1, 
    [ Promise 2,
        [Promise 3,
            [Promise 4,
                etc...
            ]
        ]
    ]
]

Which will eventually resolve itself.  Not exactly performant, or readable.  

...

The problem, briefly, is that you end up with recursion when you try to find all:

Search 1 ->  (returns S1.promise)
	Search 2 -> (appends S2.promise to S1.promise)
		Search 3 -> (appends S3.promise to S1.promise and S2.promise) 
			done, resolve S1.promise && S2.promise && S3.promise altogether.

You cannot simply chain promises here in the normal fashion (.then().then().then() etc) because we do not know how many promises we'll end up with in the end. We have no idea how deep the thread goes, we must simply wait for the last one to return the whole stack of promises.  That is effectively, the *first* promise is not resolved until the *last* search is done.

Instead, in each step we must return a promise, which is added to the chain of promises to be resolve all at once.   This is kind of messy.  This also can lead users to make basic mistakes such as this one (the Promise.all method collects other promises into a single new promise that resolves when all are done) :

var promise = Promise.all(
	rf.search(),
	rf.search(),
	rf.search()
).then(function( results ) {
	console.log(results);
});

Where they will think they're getting the first three results, when in fact they will receive three copies of the first result, because they happen simultaneously.


The simple solution is have a searchAll() method, that returns a promise that gets all results.  A great addition to this is to provide a limit argument, which only finds the first N results and then returns.  Those three options (find one, find all, find N) should account for the majority of use cases nicely, and will provide a single familiar interface for users.  Given that, Example 1 becomes much nicer:


Without promises, get the third (original example):

var rf = new FindText({ text: "Rage, rage" });
var result = rf.search(); // result is 1st instance of string
    result = rf.search(); // result is 2nd instance of string
    result = rf.search(); // result is 3rd instance of string, the target instance

get all:

var rf = new FindText({ text: "Rage, rage" });
var results = [];
while( var result = rf.search() ) {
	results.push(result);
}

get 3:

var rf = new FindText({ text: "Rage, rage" });
var results = [];
results.push( rf.search() ); // result is 1st instance of string
results.push( rf.search() ); // result is 2nd instance of string
results.push( rf.search() ); // result is 3rd instance of string



With promises and searchAll, get the third:

var rf = new FindText({ text: "Rage, rage" });
var promise = rf.searchAll(3);
promise.then( function( results ) {
	console.log( results[2] );
} );

get all:

var rf = new FindText({ text: "Rage, rage" });
var promise = rf.searchAll();
promise.then( function(results) {
	console.log(results);
});

get 3:

var rf = new FindText({ text: "Rage, rage" });
var promise = rf.searchAll(3);
promise.then( function( results ) {
	console.log( results );
} );

Much cleaner than my previous example, obviously!  Here's a good description of promises that shows how they should be used, and covers the philosophy a bit better than most tutorials:

https://blog.domenic.me/youre-missing-the-point-of-promises/


Bill Hunt
Senior Developer
OpenGov Foundation
http://opengovfoundation.org/

Ph: 20-BILL-HUNT
       202 455 4868
bill@opengovfoundation.org

On Oct 6, 2015, at 10:47 AM, Ivan Herman <ivan@w3.org> wrote:

> Hey Doug,
> 
> After a first read, I have two questions/comments.
> 
> - (This is minor:) the idea of using an edit distance for suffix/prefix is great. However: the way you specify the (maximal) edit distance is through a number, ie, the number of editing steps. However, shouldn't this edit distance limit be expressed (or at least alternatively express) through a percentage of the editing distance over the size of the suffix/prefix? I mean: if the suffix is 4 characters long, then an edit distance of 3 is significant, whereas the same distance is insignificant if the suffix is 100 characters long. Would a percentage be a good alternative?
> 
> - (This may be major, but may simply be a result of my own ignorance:) I have read about, and actually used in a simple setting, Promises, but they still twist my mind, I must admit. One thing that seems to be fairly complex when using Promises is when one has to create cycles using them, primarily when the number of steps in the cycle is unknown in advance. On the other hand, using the search() method in the current spec would require exactly that: you do some sort of an iterative go through the search results. Maybe there is an easy way to express that with promises which I simply do not know, but if this really is complex then what this tells me is that the searchAll() might become the method of choice (and one could then run a traditional cycle on the results). There are, obviously, performance issues, though.
> 
> B.t.w., I believe that the example:
> 
> var rf = new FindText({ text: "Rage, rage" });
> var result = rf.search(); // result is 1st instance of string
>    result = rf.search(); // result is 2nd instance of string
>    result = rf.search(); // result is 3rd instance of string, the target instance
> 
> would not work, exactly for this reason. Each rf.search() returns a Promise, ie, one has to use a rf.search().when(function{…}) pattern for each entry, and it is not clear in my mind how the iteration materializes in the code.
> 
> Apologies if I am completely wrong in terms of these Promises...
> 
> 
> Cheers
> 
> Ivan
> 
> 
>> On 05 Oct 2015, at 21:03 , Doug Schepers <schepers@w3.org> wrote:
>> 
>> Hi, folks–
>> 
>> This weekend, I made substantial changes to the FindText API [1] (formerly called the RangeFinder API).
>> 
>> I improved the internationalization aspects and options, based on feedback from the I18n WG and from their updated CharMod spec (Character Model for the World Wide Web: String Matching and Searching… which seems tailor-made for us!).
>> 
>> I also fleshed out the algorithm for search (though it still needs lots of work), which was one of two critical changes needed before FPWD.
>> 
>> The remaining critical change is for me to update the examples, which is important because those will shape many people's first impressions of the spec (because examples are easy to read and understand). This is my plan for the rest of the day. This involves describing the workflow in terms of Promises, which I'm sad to admit I've never used in running before.
>> 
>> Luckily, I have two meetings set up for this afternoon with folks to help me with that:
>> 
>> * Chris Birk and Bill Hunt, from OpenGov Foundation
>> * Alexander Schmidtz, from jQuery
>> 
>> These guys are very familiar with Promises, and so my examples and API design will have at least a bit of vetting and validation before pushing FPWD. There will always be room for improvements, but we should be ready to go by tomorrow.
>> 
>> 
>> I welcome feedback from any of you on this spec!
>> 
>> 
>> [1] http://w3c.github.io/findtext/
>> [2] http://w3c.github.io/charmod-norm/
>> 
>> Regards–
>> –Doug
>> 
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
> 
> 
> 
>
Received on Tuesday, 6 October 2015 14:55:08 UTC