Re: Knowledge graphs from StratML from Paul Alagna on 2020-03-18 (public-aikr@w3.org from March 2020)

From: Paul Alagna <pjalagna@gmail.com>
Date: Wed, 18 Mar 2020 03:10:29 -0400
To: paoladimaio10@googlemail.com
Cc: W3C AIKR CG <public-aikr@w3.org>
Message-Id: <FEFFE4F3-F2ED-457F-ABB5-AF0635CF3920@gmail.com>
Paola, AIKR;

    Paola asked me to explain. here goes -

0- please state why do you think this is your task, 
-- there is a fundamental difference between WHAT is presented to an ANN (AI Neural Network) and HOW it is presented. 
One could take it as premature to wonder about the HOW of things while we still are formulating WHAT those items are.
But in my world feedback from the HOW group often influences the WHAT group. on a personal level I saw a fire burning in the back room and started to attack it.

1- what are the expected outcomes of what you propose to do
-- XML can not be feed consistently into an ANN. the data moves around from entry to entry. Meta-names are conflicted, confusing and inconsistant. RDF piles (double directed graphs) provide the disambiguation of the Meta data. or at least point to where the XML has not provided needed information.
RDF also provides a consistant pathway for creating the fact-reaction records needed to feed an ANN. 

1a- , and what benefits purpose
-- the first benefit is to understand how we get from point A to point B. (I'm a HOW person) or at least understand the barriers and mitigate them.

2- , and how does it fit into the overallplan/mission of the group.
 
from AIKR wiki: the AIKR Proposed outcomes:

p1- A comprehensive list of open access resources in both AI and KR (useful to teaching and research)
-- the data sets and algorithms we develop will also become resources.
 
p2- A set of metadata derived from these resources
-- explainable, relatable, consistant and comprehensive. IMHO we don't get there without explaining Storage or Retrieval.

Also from AIKR wiki:
Derive a set of metadata/tags that help tag/identify/retrieve/reason at least in part (first phase)
--- esp. retrieve which implies a consistant storage.

p3- A concept map of the domain

p4- A natural language vocabulary to represent various aspects of AI
-- indeed.

p5- One or more encoding/implementations/ machine language version of the vocabulary, such as ChatBot Natural Language Understanding & Natural Language Generation

p6- Methods for KR management, especially Natural Language Learning / Semantic Memory
-- Semantic Memory. which in my mind implies the syntaxtial  relationships (The grammatical arrangement of words* to form sentences) amongst the data.



3- Which one of your proposed goals does this activity fit into(or if compelling, we could add additional goal )

the AIKR goal:
The overall goal/mission of this community group is to explore the requirements, best practices and implementation options for the conceptualization and specification of domain knowledge in AI.
-- implementation options. again, One could take it as premature to wonder about implementation at this point of the journey but data has to live somewhere, be explainable, relatable, consistant and accountable. usage has to be explainable.



Also from AIKR wiki:
We plan to place particular emphasis on the identification and the representation of AI facets and various aspects (technology, legislation, ethics etc) with the purpose to facilitate knowledge exchange and reuse.

1- facilitate knowledge exchange and reuse.
-- this is my main goal.

2- Refine and expand the core metadata set with additional elements and attributes to become more comprehensive (second phase_
--- here is a great feedback from the HOW side of life. you say comprehensive but consistency needs to be addressed simultaneously.

3- One or more encoding/implementations/ machine language version of the vocabulary, such as, ChatBot Natural Language Understanding & Natural Language Generation
--- IE an understanding of the vocabulary.

4- Methods for KR management, especially Natural Language Learning / Semantic Memory
--- Semantic Memory. which in my mind implies recording (storage) of the syntaxtial relationships (The grammatical arrangement of words* to form sentences) amongst the data.

PDM - thanks for the opportunity to expound on what i have said and done. all of this is open to discussion, even critical discussion. i like being challenged and the overall problem of creating an ethical AI is all new territory.

Thanks Again
Paul Alagna


> On Mar 15, 2020, at 9:11 PM, Paola Di Maio <paola.dimaio@gmail.com> wrote:
> 
> Thanks Paul
> scrolled briefly through your long email,  before I attempt to study it
> please state why do you think this is your task, what are the expected outcomes of you propose to do, and what benefits purpose, and how does it fit into the overall
> plan/mission of the group. Which one of your proposed goals does this activity fit into
> (or if compelling, we could add additional goal )
> thanks!
> PDM
> 
> On Mon, Mar 16, 2020 at 8:45 AM Paul Alagna <pjalagna@gmail.com <mailto:pjalagna@gmail.com>> wrote:
> All;
>     My task as i see it is to create AI entries from a StratML XML report. 
>     So you know I started playing. My attack is as follows:
> 1- convert the hierarchical XML into an RDF pile.
> 
> 2- convert the RDF pile into a strongly serialized fact-reaction data set
> 
> 3- feed the ANN with the data set
> 
> currently I'm stuck here at 1...
> 1- convert the hierarchical XML into an RDF pile:   
>     I am running on the assumption that a strongly strongly serialized fact-reaction data set like that derived from an RDF graph is the most concise way to feed an AI neural net. I wrote a python program to create an RDF pile.  Some problems there. The structure seems to be as predicted. the meta names are focused on the needs of the schema and the meta items are related according to the schema's intention. but there is a lot of ambiguity amongst the element names. My first attempt at disambiguation was to use an address marker. The hierarchy implies an address marker (where in the descending parse did we pick this element up). Using that address along with the name disambiguates it.
>     This is part of the report..StrategicPlan@1..Name@2=>((Use Cases for the StratML Standard))
> 
> ..StrategicPlan@1..Description@3=>((None))
> 
> ..StrategicPlan@1..OtherInformation@4=>((None))
>         Description@3 is different from ..StrategicPlan@1..StrategicPlanCore@5..Organization@6..Description@10=>((None)) Description@10
>         In order to mesh graphs or transfer graphs via a PI network the addresses need to remain constant. And that will change by adding fields to the XML. Some sort of GUID is needed and I wonder if that consistency was the intent of the identifier element. But its not used uniformly.
> 
> in dissecting a StratML example, to replicate it into an RDF graph, i found that the chain of meta tags that appeared to have the same syntactical provenance did not reflect the logical relationships (peerage) of the elements.
> Some elements were peers and some were not. the hierarchy nor the provenance provided enough information to discern which items were peer related (IE of the same chain) and which items were not.
> for instance:
> [StrategicPlan=>StrategicPlanCore=>Goal] appears more than 700 times.
> 
> As a data purest this upsets me to no end.
> 
> my discussion is as follows.
> 
> key 
> leads to  is tokenized as =>
> 
> equality set of chains:
> given 2 chains
> 1=>2=>3 ; 1=>2=>3
> OR these 2:
> customer=>order; customer=>order
> these chains are considered equal if and only if each named node in each chain is [equal in every way]*1 to its counterpart on the other chain.
> 
> [equal in every way]*1: given a profile of an element (attributes, values, formats, position in syntax, etc.) these elements are equal if they have the same profile. in data science we call this profile the elements signature.
> 
> peer chain:
> given 2 chains
> 1=>2=>3=>4 ; 1=>2=>3=>5
> 4 and 5 are peers if and only if nodes 1,2,3 form an equality set of chains.
> 
> there are exceptions 
> for example:
> customer=>order=>lineItem=>part=corn;
> customer=>order=>lineItem=>part=rice;
> the parts(corn and rice) are peers if and only if 
> the customer in both chains is the same customer AND
> the order in both chains is the same.
> Oddly the lineItem does not have to be the same (and would not, in the same order, be the same).
> In data science elements like "lineItem" are considered constructs to sequence or differentiate elements under a common "key". they offer no further business intelligence. for instance, that corn is at lineItem 1 offers no additional business knowledge about the order or the customer.
> 
> Homonym chains:
> given 2 chains
> 1=>2=>3=>4 ; 1=>2=>3=>5
> if any of the preceding nodes (1 or 2 or 3) do not equal there counterpart (their equally named node) then node 4 and 5 reside in different chains confused by the homonym.
> 
> there are NO exceptions 
> customer=>order=>lineItem=>part=corn;
> customer=>order=>lineItem=>part=rice;
> if the customer is different then both chains are different.
> if the order is different then both chains are different.
> 
> 
> meta-blocks:
> fragments of chains can be peers or independent.
> take for example the fragment [lineItem=>part]
> in an equivalent chain fragment [customer=>order] each succeeding fragment (in accordance with the syntax) are peers.
> 
> so i conclude that
> Given 2 sequences by name alone (meta trails) one can not differentiate peer chains from homonym chains. Because it is the meta trails alone that provide the syntax [the grammatical format] then that format (in this case XML) has to provide a means to disambiguate peer chains from homonym chains.I will further state that these are business decisions to be made.
> 
> my solution is to in all cases have the elements KRI* follow the element meta name during parsing*2.
> customer-John344-order-12/12/2020-*-lineItem-1-part-part1=corn;
> customer-Bill-order-12/12/2020-*-lineItem-1-part-part10=rice;
> 
> the "-*-" signifies that element lineItem is a construct.
> 
> parsing*2 - this could be accomplished in several ways:
> 1) adding an attribute to the element <customer KRI='John344'> OR
> 2) following the element name with its KRI
> <customer>
>     <KRI>John344</KRI>
>     <order><KRI>12/12/2020</KRI>
> 
> KRI*: the Knowledge Reference Identifier should be a unique identifier that points to a profile of this node.  
> the data purest in me has always thought that the attributes of an element ARE its signature. So I prefer solution 1.
> 
>         Somewhere is a <profile ID=‘John344’> … that defines this element like no other
>         What I believe that means is that goal[1] is not enough. It would be enough for an single RDF graph but not for a transfer or meshing of graphs
> 
>                 If we added actions to RDF like “equate” then we can combine elements from one graph to another “name@3” equates to “organization@9000”.
>         One could then create a super language (Or as Carl Mattocks calls it an Owl-light like language) for meshing or transfer.  (where StratML names become the super language)
> 
> So - i need help here guys. is this an XML question with an XML solution? Though, I think an XML solution would not be complete. the KRI and the signature profile needs a business solution (IE StratML extensions) and perhaps even AIKR extensions.
> 
> thoughts?
> 
> Paul Alagna
> 
> 

Thanks
PAUL ALAGNA
PJAlagna@Gmail.com <mailto:PJAlagna@gmail.com>
732-322-5641
Received on Wednesday, 18 March 2020 07:10:52 UTC