Re: query state evidence? (was Re: Word sense discrimination) from Dave Raggett on 2021-08-25 (public-cogai@w3.org from August 2021)

From: Dave Raggett <dsr@w3.org>
Date: Wed, 25 Aug 2021 09:43:24 +0100
To: Timothy Holborn <timothy.holborn@gmail.com>
Cc: public-cogai <public-cogai@w3.org>
Message-Id: <6F0AE801-86AE-4B6B-8791-DD6BB904A072@w3.org>
A well defined subset of ISO8601 is used in “chunks” as a convenient data/time format that is mapped into its constituent properties.

If you want to model provenance and data quality etc. you can use chunk models as appropriate. This also relates to the role of @context for expressing statements about statements, e.g. “John said Mary is at work”, see:

https://github.com/w3c/cogai/blob/master/chunks-and-rules.md#statements-about-statements 


> On 24 Aug 2021, at 15:48, Timothy Holborn <timothy.holborn@gmail.com> wrote:
> 
> I noted the defined statement about ISO8601...
> 
> How (when applicable) does the emerging standard seek to consider 'query state evidence' meaning, different sources (depended upon for query outcomes) having some sort of 'state' evidence / check-sum, common-compliance with ISO8601 (date/time/timezone)?  
> 
> ie: laws change.... provenance (and insights) evolves.  (ie: https://twitter.com/BasilMarte/status/1188098436861743104 <https://twitter.com/BasilMarte/status/1188098436861743104> occurred in 2019, prior to the toilet-paper shortage pandemics of 2020). 
> 
> Timothy Holborn. 
> 
> On Tue, 24 Aug 2021 at 22:22, Timothy Holborn <timothy.holborn@gmail.com <mailto:timothy.holborn@gmail.com>> wrote:
> Hi Dave (/list)
> 
> I started working on an email response; but its turning into a sort of paper expanding upon my introduction to this group given the topic, so, will post seperately and its still a work in progress.
> 
> 
> On Tue, 24 Aug 2021 at 00:52, Dave Raggett <dsr@w3.org <mailto:dsr@w3.org>> wrote:
> I hope you have had a pleasant summer.  I’ve been doing some background reading, and am getting closer to starting some further implementation experiments. The aim is to develop a good enough solution for transforming natural language into graph representations of the meaning. By good enough, I mean good enough to enable work on using natural language in experiments on human-like reasoning and learning.
> 
> Natural language understanding can be broken down into sub-tasks, such as, part of speech tagging, phrase structure analysis, semantic and pragmatic analysis. The difficulties mainly occur with the semantic processing. Many words can have multiple meanings, but humans effortlessly understand which meaning is intended in any given case. Semantic processing is also needed to figure out prepositional attachments, and  determine what pronouns, and other kinds of noun phrases, are referring to.
> 
> I'm led to believe some of the inferencing aspects link to Semiotics ( https://en.wikipedia.org/wiki/Semiotics <https://en.wikipedia.org/wiki/Semiotics> ) 
> 
> back in 2001, had a minor involvement associated to cataloguing and making searchable digibetas to phonetically transcribed MPEG2 files in a database.  From memory https://en.wikipedia.org/wiki/Nuance_Communications <https://en.wikipedia.org/wiki/Nuance_Communications> was the leader in phonetic analysis at that time.  
> 
> Early last decade i learned of https://www.mico-project.eu/ <https://www.mico-project.eu/> and with that sparql-mm, that i hoped could provide an open standards based methodology / tooling / reference platform. I am not aware presently of more advanced works done since. 
> 
> QUESTION; How may the outcome support 'freedom of thought' and how does that relate to the W3C patent Pool related mandates / membership interests, etc.? 
> 
> Two decades ago work on word sense disambiguation focused on n-gram statistics for word collocations. More recently, artificial neural networks have proved to be very effective at unsupervised learning of statistical language models for predicting what text is likely to follow on from a given text passage. Unfortunately, marvellous as this is, it isn’t transferrable to tasks such as word sense disambiguation and measuring semantic consistency for deciding on prepositional attachments etc.
> 
> I am therefore still looking for practical ways to exploit natural language corpora to determine word senses in context. The intended sense of a word is correlated to the words with which it appears in any given utterance. The accompanying words vary in their specificity for discriminating particular word senses. However, strongly discriminating words may be found several words away from the word in question. A simple n-gram model would require an impractical amount of memory to capture such dependencies.  We therefore need a way to learn which words/features to pay attention to, and what can be safely forgotten as a means to limit the demand on memory.
> 
> I rather like the 1995 paper by David Yarowksy “Unsupervised words sense disambiguation rivalling supervised methods”. This assumes that words have one sense per discourse and one sense per collation, and exploits this in an iterative bootstrapping procedure. Other papers exploit linguistic resources like WordNet. I am now hoping to experiment with Yarowksy’s ideas using loose parsing for longer range dependencies, together with heuristics for discarding collocation data with weak discrimination.
> 
> I’ve downloaded free samples of large corpora from www.corpusdata.org <http://www.corpusdata.org/> as a basis for experimentation.
> 
> perhaps creating some sort of github file or solution, that provides reference to an array of open resources, could be useful? 
> 
>  
> Each word is given with its lemma and part of speech, e.g. "announced", "announce", "vvn”. This will enable me to apply shift-reduce parsing to build phrase structures as an input to computing collocations. Further work would address the potential for utilising prior knowledge, e.g. from WordNet, and how to compute measures of semantic consistency for resolving noun phrases and attachment of prepositions as verb arguments.  An open question is whether this can be done effectively without resorting to artificial neural networks.
> 
> Anyone interested in helping?
> 
> yes.  but 'not ready yet' (personally)...  noting i do not have all the necessary skills to support the underlying scope of works, without help / cooperative collaboration with others, etc. 
> 
> also - isn't this sort of stuff computationally intensive?  how can / are experiments (be) funded?  is there a schema about how projects be defined by the scope that is incorporated and aspects set-aside?
> 
> q: how and what in the proposed specification supports temporal considerations?  including but not limited to if an inference has a dependency upon an API and/or 3rd party query service...  
> 
> therein - inferences based on 'half truths' (in simple language) are likely to be different to inferences / dermination (causality) linked to having a better means to form opinions. like mindfulness / consciousness and related facets; this isn't necessarily about result that are bad for purposes intended by others; but sometimes that is the case.  The underlying concept linking to the idea of 'the status of the observer'; https://www.youtube.com/watch?v=ZYPjXz1MVv0&list=PLCbmz0VSZ_voTpRK9-o5RksERak4kOL40&index=4&t=5s  <https://www.youtube.com/watch?v=ZYPjXz1MVv0&list=PLCbmz0VSZ_voTpRK9-o5RksERak4kOL40&index=4&t=5s>  - my much longer writing (been working on it for a couple of hours so far, for the express purpose of this group work) will go into considerations / deliberations in more detail, suffice to say for now - IMO, its quite complicated stuff...  
> 
> i see here: https://github.com/w3c/cogai/blob/master/demos/decision-tree/rules.chk <https://github.com/w3c/cogai/blob/master/demos/decision-tree/rules.chk> a series of considerations about a 'way of thinking' of a particularly illustrated underlying concept.  It seems obvious to consider that some such examples are based on physics or similar (ie: gravity, amongst others) others may be more subjective (ie; linked to religious / worship related / spiritual belief's, or medical procedures (including but not limited to OSCE's ( https://en.wikipedia.org/wiki/Objective_structured_clinical_examination <https://en.wikipedia.org/wiki/Objective_structured_clinical_examination> );  does the present scope of works have a concept of 'libraries' or 'sources' or similar?   The Sci-Fi example would be Neo uploading knowledge https://www.youtube.com/watch?v=w_8NsPQBdV0�� <https://www.youtube.com/watch?v=w_8NsPQBdV0>  the more pragmatic example, would be virus signature libraries uploaded (or downloaded, depending on how you think about it) into anti-virus programs...  
> 
> part of the underlying thought is about 'computational load' which will likely have an impact (various implications) on how solutions can be deployed (how well they may be 'democratised', or similar). 
> 
> also; what consideration has been given on storing resources on DLTs (ie: blockchains, DHTs, cryptographically signed (tamper evident), decentralised resources)? 
> 
> Timothy Holborn.
> 
> 
> Dave Raggett <dsr@w3.org <mailto:dsr@w3.org>> http://www.w3.org/People/Raggett <http://www.w3.org/People/Raggett>
> W3C Data Activity Lead & W3C champion for the Web of things 
> 
> 
> 
> 

Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C Data Activity Lead & W3C champion for the Web of things
Received on Wednesday, 25 August 2021 08:43:31 UTC