Re: FindText API Updated Editor's Draft

Sorry to have been out of the loop for a few days, so I'll try to only touch on the highlights here -

1) Randall's proposed code is a good way to handle this. The calling code doesn't have access to the underlying promises here, but that's not strictly necessary - a promise is a promise, as they say. 

2) Ivan, I still feel very strongly that a searchAll() is necessary.  This is a very common use case, as common as searching for one item, and forcing users to re-implement that boilerplate code over and over is less than ideal. It means that there will always be wrapper libraries used, which is what I think we've been trying to avoid in the web-javascript world for years (since the "browser wars").  

Moreover, it's almost universal to have the find one / find many pairing in most other get-data interfaces that users will be familiar with.  E.g.,the Waterline ORM used by Sails has findOne and find (all/many), as do most ORMs.
https://github.com/balderdashy/waterline#user-content-query-methods

And of course for any REST interface, users are able to reach the index for a list, or a particular resource by identifier; it's a bit of a stretch, but the most common interface I can think of.

Regular Expressions are probably the closest use case for this interface, so users would probably expect something similar to the native regexp interface, which only has find many:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match

On the other hand, iterators for getting data have all but disappeared from modern Javascript - it's purely anecdotal, but I can't think of a time I've used an iterator interface in the last few years.  You certainly see it in things like Wordpress' loop, but I think it'd be a bit strange to me to encounter one in Javascript.  Again, where they do exist (as with database interfaces), they've been hidden by thick layers of wrapper libraries.

Last, if the calling code has access to textDistance, I assume it will be very common to return alternatively all results to the end user in order of best to worst match, which can't be done easily as-is.  Again, you'd need a wrapper to iterate over all the results and aggregate them, then sort that list.  

3) It might be nice (as I think was proposed earlier) to allow users to use either promises or synchronous code, as the Waterline example above does, and leave it to the end user to choose which they want - but that again complicates matters and can lead to confusion.  Of the options, I'd lean to the side of promises-only, again for large documents' sake.

4) Doug, I'm not sure if I've read the document closely enough so I apologize if this is an incorrect reading, but as it stands this API is only for text-matching, correct?  So users would be unable to do any sort of pattern-matching (i.e., regular expressions)?  That seems like a useful thing to have, the very first plugin I added to Chrome was a find-in-page regex replacement for the default search.

I apologize if I've missed any nuance here, I've tried to catch up on the week or so I've missed.

Cheers,
-Bill



Bill Hunt
Senior Developer
OpenGov Foundation
http://opengovfoundation.org/


On Oct 12, 2015, at 5:09 AM, Ivan Herman <ivan@w3.org> wrote:

>> 
>> On 08 Oct 2015, at 14:50 , Benjamin Young <bigbluehat@hypothes.is> wrote:
>> 
>> On Thu, Oct 8, 2015 at 5:53 AM, Ivan Herman <ivan@w3.org> wrote:
>> 
>>> On 07 Oct 2015, at 21:43 , Benjamin Young <bigbluehat@hypothes.is> wrote:
>>> 
>>> Yeah...I don't think I'd go so far as to use them together...though one certainly could. I'm afraid the article you linked to tangled the wires for me a bit…
>> 
>> Heh... That is good for me, it shows that I am in good company:-)
>> 
>> As for the pattern below: as I said, it is not clear it works in the first place. What I did understand, though, is that the combination with generators/iterators does make the cycles with promises a bit more readable and also more efficient: as we saw in Bill's code, without ES6, in fact, we have to execute all the promises before really doing anything (even if that may not be visible in the code), ie, we do not gain anything efficiency-wise. When using generators, a yield would explicitly relinquish control, ie, there is indeed more parallelism.
>> 
>> However. Here is a thought but with a huge caveat. The caveat is that I am an old man, ie, old-skool, who grew up with antiquated languages like Python and, God forbid, C, so I did not drink of the Kool-Aid of asynchrony everywhere that seems to permeate the usage of Javascript (sorry if I sound cynical). With that out of the way, here is my question: do we really need the FindText interface to be async? 
>> 
>> After all, the reason we have async methods/functions for things like AJAX is because, indeed, these operations are slow compared to the rest, so some level of asynchrony is necessary in a browser. However… isn't it correct that the typical usage of FindText is to search through an already existing DOM tree in the, say, browser, ie, when the full DOM is already in memory? Because if so, then using async (promises or callbacks) sound like an unnecessary complication for the user. 
>> 
>> So… please, somebody convince me that having simply having a search() and a searchAll(), returning a match or an array of matches, respectively, is not enough… That we have to impose on the users a thorough understanding of promises, generators, etc, thereby reducing our user base significantly due to complexity?
>> 
>> Good point, Ivan. One thing (related) that we need to be careful of is not designing this API for the polyfills, but for the browser. The polyfill will (likely) be the slower one and in a large enough document it having an asynchronous API may make since, but...to your point...an asynchronous API may not be what one would expect from a `.search()` or even `.searchAll()` (though that seems a bit more reasonable).
>> 
>> Here's a current use case (of sorts). In Chrome, when you "find text" highlights appear throughout the document (however large or complex), you're taken to the first one, you can "page" (next/prev) through the rest of them, and they all appear as lines in the scrollbar. Accessing that seems to be what we'd be enabling here.
>> 
>> Chrome does the "find text" process as you type (highlighting within the page and the scroll bar with each letter you input). I've no idea if their code is asynchronous. I'm guessing that it is (at some level), so that it doesn't block user input.
>> 
>> If we imagine re-implementing that experience in a cross-browser JS UI built on top of this FindText API, then I think we start to get close to what we'd need/want from the underlying FindText API with regards to (a)synchronicity.
>> 
>> I could see it either way, but likely asynchronous will be better for avoiding blocking the user while the search is done (/me ignores Web Workers for the moment ;) ).
> 
> The question is how much blocking that really means; remember that the search happens on the DOM tree, ie, after parsing and URI access, ie, in memory. 
> 
> Obviously, browser vendors should tell us. As purely anecdotical evidence I tried to get some experience, so I open Tolstoy's War and Peace as an ePUB in the iBook reader on my iPad Mini 3, and I tried a simple search for the name of a character that I know is all over the place in the novel. It is difficult to see how fast that really was, because the user interface was such that it gave the first set of result on the screen, as much as it could display, with an extra button for the next batch. But getting there was instantaneous. The War and Peace, when printed, is several thousands of pages long, ie, I believe it qualifies for a long text…
> 
> It may well be that the iPad reader does it asynchronously behind the scenes. And that is actually fine. If what we do is to define search() as an iterator, and we do not mention asynchrony, the we leave implementations to optimize at their heart's content, but it remains simple for the user.
> 
> (I admit my mantra is: use async only if it is really really absolutely necessary, otherwise keep away from that stuff!)
> 
> Cheers
> 
> Ivan
> 
> 
>> 
>> Thoughts?
>> 
>> 
>> My apologies if this is all just stupid grumbling...
>> 
>> Ivan
>> 
>> P.S. A nice quote from the blog that you referred to (thanks for it!):
>> 
>> "That being said, promises aren't perfect. It's true that they're better than callbacks, but that's a lot like saying that a punch in the gut is better than a kick in the teeth. Sure, one is preferable to the other, but if you had a choice, you'd probably avoid them both.
>> 
>> While superior to callbacks, promises are still difficult to understand and error-prone[…]. Novices and experts alike will frequently mess this stuff up, and really, it's not their fault. The problem is that promises, while similar to the patterns we use in synchronous code, are a decent substitute but not quite the same.
>> 
>> In truth, you shouldn't have to learn a bunch of arcane rules and new APIs to do things that, in the synchronous world, you can do perfectly well with familiar patterns like return, catch, throw, and for-loops. There shouldn't be two parallel systems that you have to keep straight in your head at all times"
>> 
>> It seems that ES7 may make this simpler in future. But we should not specify for ES7; its definition and eventual deployment is way too far down the line.
>> 
>> 
>> 
>> 
>>> 
>>> Here's the article that's helped me the most wrt Promises (fwiw):
>>> http://pouchdb.com/2015/05/18/we-have-a-problem-with-promises.html
>>> 
>>> Maybe that helps a bit. :)
>>> 
>>> On Wed, Oct 7, 2015 at 7:52 AM, Ivan Herman <ivan@w3.org> wrote:
>>> Well… I tried to understand how generators and promises work together. It is a bit like a lame leading the blind, in the sense that I do not have a really really comfortable feeling about Promises in complex situations; as for generators, I am familiar with them in Python, but the ES6 version is more complex. I have gone through some of the examples and texts around; my pattern comes from [1]. Based on the patterns I found in [1], here is a structure that *may* work with the current interface:
>>> 
>>> function runSearch( params, generator ) {
>>>     var iterator = generator(param), ret;
>>>     (function iterate(val){
>>>     	// This is where the control goes back to the "runSearch" part
>>>     	// and the match result is sent back 
>>>         ret = iterator.next(val);
>>>         if( !ret.done ) {
>>>         	// ret.value is the Promise set by FindText.search()
>>> 		// the 'then' part is when the next match is found
>>> 		// the iteration will get the result back to the 'search' part below
>>>         	ret.value.then( function(match) {
>>>         		iterate(match.result)
>>>         	});
>>>         }
>>>     })();
>>> }
>>> 
>>> //====
>>> 
>>> params = { ..findtext params...};
>>> runSearch( params, function *search() {
>>> 	var range;
>>> 	do {
>>> 		// Note that the result of the yield is what the corresponding 'next' sends 
>>> 		// ie, it will be the match result.
>>> 		// This is also where the async part kicks in, because the .search(), returning a Promise, leads to it.
>>> 		match = yield find_text.search();
>>> 		if( match.result ) {
>>> 			// do something with match.result
>>> 		} 
>>> 	} while( match.result );
>>> })
>>> 
>>> 
>>> I have no idea whether this makes sense, ie, whether that works. But maybe more importantly: I still think it is very complicated, requires a thorough understanding of complex things, ie, I am not sure that it would be the right level of abstraction for the API. And it works with ES6, although I agree that it may be acceptable for the API to rely on the ES6.
>>> 
>>> Just an idea: what about hiding all this to the end user? Isn't it possible to say that the result of search() is actually an iterator in the ES6 sense, and it is up to the API implementation to hide all the async complexity? Or should we keep to one single searchAll() that would return an iterator and stop there?
>>> 
>>> Ivan
>>> 
>>> [1] http://davidwalsh.name/async-generators
>>> 
>>> 
>>> 
>>>> On 06 Oct 2015, at 17:17 , Benjamin Young <bigbluehat@hypothes.is> wrote:
>>>> 
>>>> Could ES6 generators be employed here?
>>>> http://www.ecma-international.org/ecma-262/6.0/#sec-generator-function-definitions
>>>> 
>>>> It currently has to be polyfiled, but perhaps the future is not far off. ;)
>>>> http://kangax.github.io/compat-table/es6/#generators
>>>> 
>>>> That could get you something like:
>>>> ```
>>>> var rf = new FindText({ text: "Rage, rage" });
>>>> var result = rf.search()
>>>> var next_result = rf.next();
>>>> ```
>>>> 
>>>> Which seems to be what one would expect (vs. a promise-based thing).
>>>> 
>>>> `searchAll()` could return a Promise for the purpose of asynchronous code and avoiding callbacks.
>>>> 
>>>> Great start, though, Doug, regardless!
>>>> 
>>>> 
>>>> 
>>>> On Tue, Oct 6, 2015 at 11:04 AM, Ivan Herman <ivan@w3.org> wrote:
>>>> Ah yes! This recursive construction to stack up promises is the solution I indeed saw (it may have been in one of the blogs of Jake Archibald) and I already forgot; and it always makes me understand it again and again:-)
>>>> 
>>>> You made me realize that using search() like that gives a fake impression of performance gain without being one, right? As you say, in fact all the promises can return with success when the last search() has also been executed; ie, performance wise, we do not really gain anything compared to a searchAll(). Would it mean that we should not use search() at all?
>>>> 
>>>> Thanks
>>>> 
>>>> Ivan
>>>> 
>>>> 
>>>> 
>>>>> On 06 Oct 2015, at 16:54 , Bill Hunt <bill@opengovfoundation.org> wrote:
>>>>> 
>>>>> Hi Ivan, 
>>>>> 
>>>>> Those are actually the precise concerns I brought up to Doug yesterday, and agree that searchAll() is a fine solution.  I also proposed that the function could take a "limit" parameter, to only get N results instead of all.  This makes promises much easier.  Here's the body of my original message that illustrates the point in detail.
>>>>> 
>>>>> Cheers,
>>>>> -Bill
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Here's Example 1, as-is, with promises, to get all until it can't find any more results:
>>>>> 
>>>>> var results = [];
>>>>> var recurseSearch = function(rf, results) {
>>>>>     var allDonePromise = new Promise();
>>>>> 
>>>>>     var searchPromise = rf.search();
>>>>>     searchPromise.then(
>>>>>         function(matchData) {
>>>>>             if(matchData) {
>>>>>                 results.push(matchData);
>>>>>                 // Found results, so continue searching.
>>>>> 
>>>>>                 // Aggregate our new promise into our collection of promises.
>>>>>                 // Add our previously-created promise here.
>>>>>                 // * Note 1
>>>>>                 var allDonePromise = Promise.all([allDonePromise, recurseSearch(text, results)]);
>>>>>             }
>>>>>             else {
>>>>>                 allDonePromise.resolve(matchData);
>>>>>             }
>>>>>         },
>>>>>         function(error) {
>>>>>             allDonePromise.reject('There was a problem getting results');
>>>>>         }
>>>>> 
>>>>>     return allDonePromise;
>>>>> }
>>>>> 
>>>>> 
>>>>> var rf = new FindText({ text: "Rage, rage" });
>>>>> recurseSearch(rf).then(function(results) {
>>>>> 	console.log(results);
>>>>> });
>>>>> 
>>>>> 
>>>>> * Note 1
>>>>> Our promise collection looks odd here.  You've got a promise object that looks like a lopsided tree: 
>>>>> 
>>>>> [ Promise 1, 
>>>>>     [ Promise 2,
>>>>>         [Promise 3,
>>>>>             [Promise 4,
>>>>>                 etc...
>>>>>             ]
>>>>>         ]
>>>>>     ]
>>>>> ]
>>>>> 
>>>>> Which will eventually resolve itself.  Not exactly performant, or readable.  
>>>>> 
>>>>> ...
>>>>> 
>>>>> The problem, briefly, is that you end up with recursion when you try to find all:
>>>>> 
>>>>> Search 1 ->  (returns S1.promise)
>>>>> 	Search 2 -> (appends S2.promise to S1.promise)
>>>>> 		Search 3 -> (appends S3.promise to S1.promise and S2.promise) 
>>>>> 			done, resolve S1.promise && S2.promise && S3.promise altogether.
>>>>> 
>>>>> You cannot simply chain promises here in the normal fashion (.then().then().then() etc) because we do not know how many promises we'll end up with in the end. We have no idea how deep the thread goes, we must simply wait for the last one to return the whole stack of promises.  That is effectively, the *first* promise is not resolved until the *last* search is done.
>>>>> 
>>>>> Instead, in each step we must return a promise, which is added to the chain of promises to be resolve all at once.   This is kind of messy.  This also can lead users to make basic mistakes such as this one (the Promise.all method collects other promises into a single new promise that resolves when all are done) :
>>>>> 
>>>>> var promise = Promise.all(
>>>>> 	rf.search(),
>>>>> 	rf.search(),
>>>>> 	rf.search()
>>>>> ).then(function( results ) {
>>>>> 	console.log(results);
>>>>> });
>>>>> 
>>>>> Where they will think they're getting the first three results, when in fact they will receive three copies of the first result, because they happen simultaneously.
>>>>> 
>>>>> 
>>>>> The simple solution is have a searchAll() method, that returns a promise that gets all results.  A great addition to this is to provide a limit argument, which only finds the first N results and then returns.  Those three options (find one, find all, find N) should account for the majority of use cases nicely, and will provide a single familiar interface for users.  Given that, Example 1 becomes much nicer:
>>>>> 
>>>>> 
>>>>> Without promises, get the third (original example):
>>>>> 
>>>>> var rf = new FindText({ text: "Rage, rage" });
>>>>> var result = rf.search(); // result is 1st instance of string
>>>>>     result = rf.search(); // result is 2nd instance of string
>>>>>     result = rf.search(); // result is 3rd instance of string, the target instance
>>>>> 
>>>>> get all:
>>>>> 
>>>>> var rf = new FindText({ text: "Rage, rage" });
>>>>> var results = [];
>>>>> while( var result = rf.search() ) {
>>>>> 	results.push(result);
>>>>> }
>>>>> 
>>>>> get 3:
>>>>> 
>>>>> var rf = new FindText({ text: "Rage, rage" });
>>>>> var results = [];
>>>>> results.push( rf.search() ); // result is 1st instance of string
>>>>> results.push( rf.search() ); // result is 2nd instance of string
>>>>> results.push( rf.search() ); // result is 3rd instance of string
>>>>> 
>>>>> 
>>>>> 
>>>>> With promises and searchAll, get the third:
>>>>> 
>>>>> var rf = new FindText({ text: "Rage, rage" });
>>>>> var promise = rf.searchAll(3);
>>>>> promise.then( function( results ) {
>>>>> 	console.log( results[2] );
>>>>> } );
>>>>> 
>>>>> get all:
>>>>> 
>>>>> var rf = new FindText({ text: "Rage, rage" });
>>>>> var promise = rf.searchAll();
>>>>> promise.then( function(results) {
>>>>> 	console.log(results);
>>>>> });
>>>>> 
>>>>> get 3:
>>>>> 
>>>>> var rf = new FindText({ text: "Rage, rage" });
>>>>> var promise = rf.searchAll(3);
>>>>> promise.then( function( results ) {
>>>>> 	console.log( results );
>>>>> } );
>>>>> 
>>>>> Much cleaner than my previous example, obviously!  Here's a good description of promises that shows how they should be used, and covers the philosophy a bit better than most tutorials:
>>>>> 
>>>>> https://blog.domenic.me/youre-missing-the-point-of-promises/
>>>>> 
>>>>> 
>>>>> Bill Hunt
>>>>> Senior Developer
>>>>> OpenGov Foundation
>>>>> http://opengovfoundation.org/
>>>>> 
>>>>> Ph: 20-BILL-HUNT
>>>>>        202 455 4868
>>>>> bill@opengovfoundation.org
>>>>> 
>>>>> On Oct 6, 2015, at 10:47 AM, Ivan Herman <ivan@w3.org> wrote:
>>>>> 
>>>>>> Hey Doug,
>>>>>> 
>>>>>> After a first read, I have two questions/comments.
>>>>>> 
>>>>>> - (This is minor:) the idea of using an edit distance for suffix/prefix is great. However: the way you specify the (maximal) edit distance is through a number, ie, the number of editing steps. However, shouldn't this edit distance limit be expressed (or at least alternatively express) through a percentage of the editing distance over the size of the suffix/prefix? I mean: if the suffix is 4 characters long, then an edit distance of 3 is significant, whereas the same distance is insignificant if the suffix is 100 characters long. Would a percentage be a good alternative?
>>>>>> 
>>>>>> - (This may be major, but may simply be a result of my own ignorance:) I have read about, and actually used in a simple setting, Promises, but they still twist my mind, I must admit. One thing that seems to be fairly complex when using Promises is when one has to create cycles using them, primarily when the number of steps in the cycle is unknown in advance. On the other hand, using the search() method in the current spec would require exactly that: you do some sort of an iterative go through the search results. Maybe there is an easy way to express that with promises which I simply do not know, but if this really is complex then what this tells me is that the searchAll() might become the method of choice (and one could then run a traditional cycle on the results). There are, obviously, performance issues, though.
>>>>>> 
>>>>>> B.t.w., I believe that the example:
>>>>>> 
>>>>>> var rf = new FindText({ text: "Rage, rage" });
>>>>>> var result = rf.search(); // result is 1st instance of string
>>>>>>    result = rf.search(); // result is 2nd instance of string
>>>>>>    result = rf.search(); // result is 3rd instance of string, the target instance
>>>>>> 
>>>>>> would not work, exactly for this reason. Each rf.search() returns a Promise, ie, one has to use a rf.search().when(function{…}) pattern for each entry, and it is not clear in my mind how the iteration materializes in the code.
>>>>>> 
>>>>>> Apologies if I am completely wrong in terms of these Promises...
>>>>>> 
>>>>>> 
>>>>>> Cheers
>>>>>> 
>>>>>> Ivan
>>>>>> 
>>>>>> 
>>>>>>> On 05 Oct 2015, at 21:03 , Doug Schepers <schepers@w3.org> wrote:
>>>>>>> 
>>>>>>> Hi, folks–
>>>>>>> 
>>>>>>> This weekend, I made substantial changes to the FindText API [1] (formerly called the RangeFinder API).
>>>>>>> 
>>>>>>> I improved the internationalization aspects and options, based on feedback from the I18n WG and from their updated CharMod spec (Character Model for the World Wide Web: String Matching and Searching… which seems tailor-made for us!).
>>>>>>> 
>>>>>>> I also fleshed out the algorithm for search (though it still needs lots of work), which was one of two critical changes needed before FPWD.
>>>>>>> 
>>>>>>> The remaining critical change is for me to update the examples, which is important because those will shape many people's first impressions of the spec (because examples are easy to read and understand). This is my plan for the rest of the day. This involves describing the workflow in terms of Promises, which I'm sad to admit I've never used in running before.
>>>>>>> 
>>>>>>> Luckily, I have two meetings set up for this afternoon with folks to help me with that:
>>>>>>> 
>>>>>>> * Chris Birk and Bill Hunt, from OpenGov Foundation
>>>>>>> * Alexander Schmidtz, from jQuery
>>>>>>> 
>>>>>>> These guys are very familiar with Promises, and so my examples and API design will have at least a bit of vetting and validation before pushing FPWD. There will always be room for improvements, but we should be ready to go by tomorrow.
>>>>>>> 
>>>>>>> 
>>>>>>> I welcome feedback from any of you on this spec!
>>>>>>> 
>>>>>>> 
>>>>>>> [1] http://w3c.github.io/findtext/
>>>>>>> [2] http://w3c.github.io/charmod-norm/
>>>>>>> 
>>>>>>> Regards–
>>>>>>> –Doug
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ----
>>>>>> Ivan Herman, W3C
>>>>>> Digital Publishing Lead
>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>> mobile: +31-641044153
>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> ----
>>>> Ivan Herman, W3C 
>>>> Digital Publishing Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C 
>>> Digital Publishing Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C 
>> Digital Publishing Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>> 
>> 
>> 
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C 
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704

Received on Monday, 12 October 2015 12:22:42 UTC