Re: query state evidence? (was Re: Word sense discrimination) from Timothy Holborn on 2021-08-25 (public-cogai@w3.org from August 2021)

From: Timothy Holborn <timothy.holborn@gmail.com>
Date: Wed, 25 Aug 2021 19:31:44 +1000
To: Dave Raggett <dsr@w3.org>
Cc: public-cogai <public-cogai@w3.org>
Message-ID: <CAM1Sok0eie+TG4otxx7R7HZWkbkyRn=GUvbep3SSuky+LRmu8g@mail.gmail.com>
Cheers...

On Wed, 25 Aug 2021, 6:43 pm Dave Raggett, <dsr@w3.org> wrote:

> A well defined subset of ISO8601 is used in “chunks” as a convenient
> data/time format that is mapped into its constituent properties.
>
> If you want to model provenance and data quality etc. you can use chunk
> models as appropriate. This also relates to the role of @context for
> expressing statements about statements, e.g. “John said Mary is at work”,
> see:
>
>
> https://github.com/w3c/cogai/blob/master/chunks-and-rules.md#statements-about-statements
>
>
>
> On 24 Aug 2021, at 15:48, Timothy Holborn <timothy.holborn@gmail.com>
> wrote:
>
> I noted the defined statement about ISO8601...
>
> How (when applicable) does the emerging standard seek to consider 'query
> state evidence' meaning, different sources (depended upon for query
> outcomes) having some sort of 'state' evidence / check-sum,
> common-compliance with ISO8601 (date/time/timezone)?
>
> ie: laws change.... provenance (and insights) evolves.  (ie:
> https://twitter.com/BasilMarte/status/1188098436861743104 occurred in
> 2019, prior to the toilet-paper shortage pandemics of 2020).
>
> Timothy Holborn.
>
> On Tue, 24 Aug 2021 at 22:22, Timothy Holborn <timothy.holborn@gmail.com>
> wrote:
>
>> Hi Dave (/list)
>>
>> I started working on an email response; but its turning into a sort of
>> paper expanding upon my introduction to this group given the topic, so,
>> will post seperately and its still a work in progress.
>>
>>
>> On Tue, 24 Aug 2021 at 00:52, Dave Raggett <dsr@w3.org> wrote:
>>
>>> I hope you have had a pleasant summer.  I’ve been doing some background
>>> reading, and am getting closer to starting some further implementation
>>> experiments. The aim is to develop a good enough solution for transforming
>>> natural language into graph representations of the meaning. By good enough,
>>> I mean good enough to enable work on using natural language in experiments
>>> on human-like reasoning and learning.
>>>
>>> Natural language understanding can be broken down into sub-tasks, such
>>> as, part of speech tagging, phrase structure analysis, semantic and
>>> pragmatic analysis. The difficulties mainly occur with the semantic
>>> processing. Many words can have multiple meanings, but humans effortlessly
>>> understand which meaning is intended in any given case. Semantic processing
>>> is also needed to figure out prepositional attachments, and  determine what
>>> pronouns, and other kinds of noun phrases, are referring to.
>>>
>>
>> I'm led to believe some of the inferencing aspects link to Semiotics (
>> https://en.wikipedia.org/wiki/Semiotics )
>>
>> back in 2001, had a minor involvement associated to cataloguing and
>> making searchable digibetas to phonetically transcribed MPEG2 files in a
>> database.  From memory
>> https://en.wikipedia.org/wiki/Nuance_Communications was the leader in
>> phonetic analysis at that time.
>>
>> Early last decade i learned of https://www.mico-project.eu/ and with
>> that sparql-mm, that i hoped could provide an open standards based
>> methodology / tooling / reference platform. I am not aware presently of
>> more advanced works done since.
>>
>> QUESTION; How may the outcome support 'freedom of thought' and how does
>> that relate to the W3C patent Pool related mandates / membership interests,
>> etc.?
>>
>>>
>>> Two decades ago work on word sense disambiguation focused on n-gram
>>> statistics for word collocations. More recently, artificial neural networks
>>> have proved to be very effective at unsupervised learning of statistical
>>> language models for predicting what text is likely to follow on from a
>>> given text passage. Unfortunately, marvellous as this is, it isn’t
>>> transferrable to tasks such as word sense disambiguation and measuring
>>> semantic consistency for deciding on prepositional attachments etc.
>>>
>>> I am therefore still looking for practical ways to exploit natural
>>> language corpora to determine word senses in context. The intended sense of
>>> a word is correlated to the words with which it appears in any given
>>> utterance. The accompanying words vary in their specificity for
>>> discriminating particular word senses. However, strongly discriminating
>>> words may be found several words away from the word in question. A simple
>>> n-gram model would require an impractical amount of memory to capture such
>>> dependencies.  We therefore need a way to learn which words/features to pay
>>> attention to, and what can be safely forgotten as a means to limit the
>>> demand on memory.
>>>
>>> I rather like the 1995 paper by David Yarowksy “Unsupervised words sense
>>> disambiguation rivalling supervised methods”. This assumes that words have
>>> one sense per discourse and one sense per collation, and exploits this in
>>> an iterative bootstrapping procedure. Other papers exploit linguistic
>>> resources like WordNet. I am now hoping to experiment with Yarowksy’s ideas
>>> using loose parsing for longer range dependencies, together with heuristics
>>> for discarding collocation data with weak discrimination.
>>>
>>> I’ve downloaded free samples of large corpora from www.corpusdata.org
>>> as a basis for experimentation.
>>>
>>
>> perhaps creating some sort of github file or solution, that provides
>> reference to an array of open resources, could be useful?
>>
>>
>>
>>> Each word is given with its lemma and part of speech, e.g. "announced",
>>> "announce", "vvn”. This will enable me to apply shift-reduce parsing to
>>> build phrase structures as an input to computing collocations. Further work
>>> would address the potential for utilising prior knowledge, e.g. from
>>> WordNet, and how to compute measures of semantic consistency for resolving
>>> noun phrases and attachment of prepositions as verb arguments.  An open
>>> question is whether this can be done effectively without resorting to
>>> artificial neural networks.
>>>
>>> Anyone interested in helping?
>>>
>>
>> yes.  but 'not ready yet' (personally)...  noting i do not have all the
>> necessary skills to support the underlying scope of works, without help /
>> cooperative collaboration with others, etc.
>>
>> also - isn't this sort of stuff computationally intensive?  how can / are
>> experiments (be) funded?  is there a schema about how projects be defined
>> by the scope that is incorporated and aspects set-aside?
>>
>> q: how and what in the proposed specification supports temporal
>> considerations?  including but not limited to if an inference has a
>> dependency upon an API and/or 3rd party query service...
>>
>> therein - inferences based on 'half truths' (in simple language) are
>> likely to be different to inferences / dermination (causality) linked to
>> having a better means to form opinions. like mindfulness / consciousness
>> and related facets; this isn't necessarily about result that are bad for
>> purposes intended by others; but sometimes that is the case.  The
>> underlying concept linking to the idea of 'the status of the observer';
>> https://www.youtube.com/watch?v=ZYPjXz1MVv0&list=PLCbmz0VSZ_voTpRK9-o5RksERak4kOL40&index=4&t=5s
>> <https://www.youtube.com/watch?v=ZYPjXz1MVv0&list=PLCbmz0VSZ_voTpRK9-o5RksERak4kOL40&index=4&t=5s>
>> - my much longer writing (been working on it for a couple of hours so far,
>> for the express purpose of this group work) will go into considerations /
>> deliberations in more detail, suffice to say for now - IMO, its quite
>> complicated stuff...
>>
>> i see here:
>> https://github.com/w3c/cogai/blob/master/demos/decision-tree/rules.chk a
>> series of considerations about a 'way of thinking' of a particularly
>> illustrated underlying concept.  It seems obvious to consider that some
>> such examples are based on physics or similar (ie: gravity, amongst others)
>> others may be more subjective (ie; linked to religious / worship related /
>> spiritual belief's, or medical procedures (including but not limited to
>> OSCE's (
>> https://en.wikipedia.org/wiki/Objective_structured_clinical_examination
>> );  does the present scope of works have a concept of 'libraries' or
>> 'sources' or similar?   The Sci-Fi example would be Neo uploading knowledge
>> https://www.youtube.com/watch?v=w_8NsPQBdV0
>> <https://www.youtube.com/watch?v=w_8NsPQBdV0>  the more pragmatic
>> example, would be virus signature libraries uploaded (or downloaded,
>> depending on how you think about it) into anti-virus programs...
>>
>> part of the underlying thought is about 'computational load' which will
>> likely have an impact (various implications) on how solutions can be
>> deployed (how well they may be 'democratised', or similar).
>>
>> also; what consideration has been given on storing resources on DLTs (ie:
>> blockchains, DHTs, cryptographically signed (tamper evident), decentralised
>> resources)?
>>
>> Timothy Holborn.
>>
>>
>>> Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
>>> W3C Data Activity Lead & W3C champion for the Web of things
>>>
>>>
>>>
>>>
>>>
> Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
> W3C Data Activity Lead & W3C champion for the Web of things
>
>
>
>
>
Received on Wednesday, 25 August 2021 09:32:09 UTC