- From: Amirouche Boubekki <amirouche.boubekki@gmail.com>
- Date: Fri, 1 Nov 2019 15:45:56 +0100
- To: W3C AIKR CG <public-aikr@w3.org>
I stumbled upon an interesting problem based on my work on vnstore (formerly fstore) that is how to represent several theories made by an algorithm in the context of a versioned branch-able database (like git). Consider for instance a gazetteer based entity-resolution system as described in the following question: https://stackoverflow.com/q/52046394/140837 Here is the code: input = 'new york is the big apple'.split() def spans(lst): if len(lst) == 0: yield None for index in range(1, len(lst)): for span in spans(lst[index:]): if span is not None: yield [lst[0:index]] + span yield [lst] knowledgebase = [ ['new', 'york'], ['big', 'apple'], ] out = [] scores = [] for span in spans(input): score = 0 for candidate in span: for uid, entity in enumerate(knowledgebase): if candidate == entity: score += 1 out.append(span) scores.append(score) leaderboard = sorted(zip(out, scores), key=lambda x: x[1]) for winner in leaderboard: print(winner[1], ' ~ ', winner[0]) The above (naive?) algorithm will guess multiple probable way to link a sentence to the knowledge base. With a determinist scoring heuristic it will filter many alternatives and for example the following alternatives: [['new', 'york'], ['is'], ['the'], ['big', 'apple']] [['new', 'york'], ['is', 'the'], ['big', 'apple']] Those are two possible way to link the input sentence "new york is the big apple". What I want to show is an example where a determinist algorithm can not come up with a single result and must keep around "theories" downstream and eliminate zero or more theory with another algorithm or knowledge acquired later. In the versioned nstore (vnstore), one can represent theories using branches (as in git) OR using an abstraction on top of the nstore. Representing theory in the vnstore will require access to the history and branch information along some data to tie together a set of theories that are related to a given problem. Whereas theories on top of the nstore will require only "some data to tie together a set of theories that are related to a given problem" but will require extra care to make sure one theory does not leak in another theory. Using the nstore approach will mean that there is yet-another structure, the structure of alternative theories, on top the nstore that is very similar to the vnstore. It gives more freedom but it also lead to more complex system. It seems to me that the vnstore seems to already solve the idea of "alternative theories", as in git, branches are alternative version of a software, but it seems like re-using vnstore abstraction for theories made by algorithms will lead to more complex code. What do you think? How do you handle alternative theories in your work? -- Amirouche ~ https://hyper.dev
Received on Friday, 1 November 2019 14:46:10 UTC