query state evidence? (was Re: Word sense discrimination) from Timothy Holborn on 2021-08-24 (public-cogai@w3.org from August 2021)

From: Timothy Holborn <timothy.holborn@gmail.com>
Date: Wed, 25 Aug 2021 00:48:36 +1000
To: Dave Raggett <dsr@w3.org>
Cc: public-cogai <public-cogai@w3.org>
Message-ID: <CAM1Sok0A8hX3+kVystbfNw3eY9r2dr=FYKJKAod8ebCPFNjbvg@mail.gmail.com>
I noted the defined statement about ISO8601...

How (when applicable) does the emerging standard seek to consider 'query
state evidence' meaning, different sources (depended upon for query
outcomes) having some sort of 'state' evidence / check-sum,
common-compliance with ISO8601 (date/time/timezone)?

ie: laws change.... provenance (and insights) evolves.  (ie:
https://twitter.com/BasilMarte/status/1188098436861743104 occurred in 2019,
prior to the toilet-paper shortage pandemics of 2020).

Timothy Holborn.

On Tue, 24 Aug 2021 at 22:22, Timothy Holborn <timothy.holborn@gmail.com>
wrote:

> Hi Dave (/list)
>
> I started working on an email response; but its turning into a sort of
> paper expanding upon my introduction to this group given the topic, so,
> will post seperately and its still a work in progress.
>
>
> On Tue, 24 Aug 2021 at 00:52, Dave Raggett <dsr@w3.org> wrote:
>
>> I hope you have had a pleasant summer.  I’ve been doing some background
>> reading, and am getting closer to starting some further implementation
>> experiments. The aim is to develop a good enough solution for transforming
>> natural language into graph representations of the meaning. By good enough,
>> I mean good enough to enable work on using natural language in experiments
>> on human-like reasoning and learning.
>>
>> Natural language understanding can be broken down into sub-tasks, such
>> as, part of speech tagging, phrase structure analysis, semantic and
>> pragmatic analysis. The difficulties mainly occur with the semantic
>> processing. Many words can have multiple meanings, but humans effortlessly
>> understand which meaning is intended in any given case. Semantic processing
>> is also needed to figure out prepositional attachments, and  determine what
>> pronouns, and other kinds of noun phrases, are referring to.
>>
>
> I'm led to believe some of the inferencing aspects link to Semiotics (
> https://en.wikipedia.org/wiki/Semiotics )
>
> back in 2001, had a minor involvement associated to cataloguing and making
> searchable digibetas to phonetically transcribed MPEG2 files in a
> database.  From memory https://en.wikipedia.org/wiki/Nuance_Communications
> was the leader in phonetic analysis at that time.
>
> Early last decade i learned of https://www.mico-project.eu/ and with that
> sparql-mm, that i hoped could provide an open standards based methodology /
> tooling / reference platform. I am not aware presently of more advanced
> works done since.
>
> QUESTION; How may the outcome support 'freedom of thought' and how does
> that relate to the W3C patent Pool related mandates / membership interests,
> etc.?
>
>>
>> Two decades ago work on word sense disambiguation focused on n-gram
>> statistics for word collocations. More recently, artificial neural networks
>> have proved to be very effective at unsupervised learning of statistical
>> language models for predicting what text is likely to follow on from a
>> given text passage. Unfortunately, marvellous as this is, it isn’t
>> transferrable to tasks such as word sense disambiguation and measuring
>> semantic consistency for deciding on prepositional attachments etc.
>>
>> I am therefore still looking for practical ways to exploit natural
>> language corpora to determine word senses in context. The intended sense of
>> a word is correlated to the words with which it appears in any given
>> utterance. The accompanying words vary in their specificity for
>> discriminating particular word senses. However, strongly discriminating
>> words may be found several words away from the word in question. A simple
>> n-gram model would require an impractical amount of memory to capture such
>> dependencies.  We therefore need a way to learn which words/features to pay
>> attention to, and what can be safely forgotten as a means to limit the
>> demand on memory.
>>
>> I rather like the 1995 paper by David Yarowksy “Unsupervised words sense
>> disambiguation rivalling supervised methods”. This assumes that words have
>> one sense per discourse and one sense per collation, and exploits this in
>> an iterative bootstrapping procedure. Other papers exploit linguistic
>> resources like WordNet. I am now hoping to experiment with Yarowksy’s ideas
>> using loose parsing for longer range dependencies, together with heuristics
>> for discarding collocation data with weak discrimination.
>>
>> I’ve downloaded free samples of large corpora from www.corpusdata.org as
>> a basis for experimentation.
>>
>
> perhaps creating some sort of github file or solution, that provides
> reference to an array of open resources, could be useful?
>
>
>
>> Each word is given with its lemma and part of speech, e.g. "announced",
>> "announce", "vvn”. This will enable me to apply shift-reduce parsing to
>> build phrase structures as an input to computing collocations. Further work
>> would address the potential for utilising prior knowledge, e.g. from
>> WordNet, and how to compute measures of semantic consistency for resolving
>> noun phrases and attachment of prepositions as verb arguments.  An open
>> question is whether this can be done effectively without resorting to
>> artificial neural networks.
>>
>> Anyone interested in helping?
>>
>
> yes.  but 'not ready yet' (personally)...  noting i do not have all the
> necessary skills to support the underlying scope of works, without help /
> cooperative collaboration with others, etc.
>
> also - isn't this sort of stuff computationally intensive?  how can / are
> experiments (be) funded?  is there a schema about how projects be defined
> by the scope that is incorporated and aspects set-aside?
>
> q: how and what in the proposed specification supports temporal
> considerations?  including but not limited to if an inference has a
> dependency upon an API and/or 3rd party query service...
>
> therein - inferences based on 'half truths' (in simple language) are
> likely to be different to inferences / dermination (causality) linked to
> having a better means to form opinions. like mindfulness / consciousness
> and related facets; this isn't necessarily about result that are bad for
> purposes intended by others; but sometimes that is the case.  The
> underlying concept linking to the idea of 'the status of the observer';
> https://www.youtube.com/watch?v=ZYPjXz1MVv0&list=PLCbmz0VSZ_voTpRK9-o5RksERak4kOL40&index=4&t=5s
> <https://www.youtube.com/watch?v=ZYPjXz1MVv0&list=PLCbmz0VSZ_voTpRK9-o5RksERak4kOL40&index=4&t=5s>
> - my much longer writing (been working on it for a couple of hours so far,
> for the express purpose of this group work) will go into considerations /
> deliberations in more detail, suffice to say for now - IMO, its quite
> complicated stuff...
>
> i see here:
> https://github.com/w3c/cogai/blob/master/demos/decision-tree/rules.chk a
> series of considerations about a 'way of thinking' of a particularly
> illustrated underlying concept.  It seems obvious to consider that some
> such examples are based on physics or similar (ie: gravity, amongst others)
> others may be more subjective (ie; linked to religious / worship related /
> spiritual belief's, or medical procedures (including but not limited to
> OSCE's (
> https://en.wikipedia.org/wiki/Objective_structured_clinical_examination
> );  does the present scope of works have a concept of 'libraries' or
> 'sources' or similar?   The Sci-Fi example would be Neo uploading knowledge
> https://www.youtube.com/watch?v=w_8NsPQBdV0
> <https://www.youtube.com/watch?v=w_8NsPQBdV0>  the more pragmatic
> example, would be virus signature libraries uploaded (or downloaded,
> depending on how you think about it) into anti-virus programs...
>
> part of the underlying thought is about 'computational load' which will
> likely have an impact (various implications) on how solutions can be
> deployed (how well they may be 'democratised', or similar).
>
> also; what consideration has been given on storing resources on DLTs (ie:
> blockchains, DHTs, cryptographically signed (tamper evident), decentralised
> resources)?
>
> Timothy Holborn.
>
>
>> Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
>> W3C Data Activity Lead & W3C champion for the Web of things
>>
>>
>>
>>
>>
Received on Tuesday, 24 August 2021 14:49:25 UTC