Re: Knowledge graphs from StratML from Paul Alagna on 2020-03-16 (public-aikr@w3.org from March 2020)

From: Paul Alagna <pjalagna@gmail.com>
Date: Sun, 15 Mar 2020 20:45:13 -0400
To: W3C AIKR CG <public-aikr@w3.org>
Message-Id: <937CF7B9-4E1E-493E-B2FD-A20830187A0C@gmail.com>
All;
    My task as i see it is to create AI entries from a StratML XML report. 
    So you know I started playing. My attack is as follows:
1- convert the hierarchical XML into an RDF pile.

2- convert the RDF pile into a strongly serialized fact-reaction data set
    
3- feed the ANN with the data set
 
currently I'm stuck here at 1...
1- convert the hierarchical XML into an RDF pile:   
    I am running on the assumption that a strongly strongly serialized fact-reaction data set like that derived from an RDF graph is the most concise way to feed an AI neural net. I wrote a python program to create an RDF pile.  Some problems there. The structure seems to be as predicted. the meta names are focused on the needs of the schema and the meta items are related according to the schema's intention. but there is a lot of ambiguity amongst the element names. My first attempt at disambiguation was to use an address marker. The hierarchy implies an address marker (where in the descending parse did we pick this element up). Using that address along with the name disambiguates it.
    This is part of the report..StrategicPlan@1..Name@2=>((Use Cases for the StratML Standard))

..StrategicPlan@1..Description@3=>((None))

..StrategicPlan@1..OtherInformation@4=>((None))
 Description@3 is different from ..StrategicPlan@1..StrategicPlanCore@5..Organization@6..Description@10=>((None)) Description@10
 In order to mesh graphs or transfer graphs via a PI network the addresses need to remain constant. And that will change by adding fields to the XML. Some sort of GUID is needed and I wonder if that consistency was the intent of the identifier element. But its not used uniformly.
 
in dissecting a StratML example, to replicate it into an RDF graph, i found that the chain of meta tags that appeared to have the same syntactical provenance did not reflect the logical relationships (peerage) of the elements.
Some elements were peers and some were not. the hierarchy nor the provenance provided enough information to discern which items were peer related (IE of the same chain) and which items were not.
for instance:
[StrategicPlan=>StrategicPlanCore=>Goal] appears more than 700 times.

As a data purest this upsets me to no end.

my discussion is as follows.

key 
leads to  is tokenized as =>

equality set of chains:
given 2 chains
1=>2=>3 ; 1=>2=>3
OR these 2:
customer=>order; customer=>order
these chains are considered equal if and only if each named node in each chain is [equal in every way]*1 to its counterpart on the other chain.

[equal in every way]*1: given a profile of an element (attributes, values, formats, position in syntax, etc.) these elements are equal if they have the same profile. in data science we call this profile the elements signature.

peer chain:
given 2 chains
1=>2=>3=>4 ; 1=>2=>3=>5
4 and 5 are peers if and only if nodes 1,2,3 form an equality set of chains.

there are exceptions 
for example:
customer=>order=>lineItem=>part=corn;
customer=>order=>lineItem=>part=rice;
the parts(corn and rice) are peers if and only if 
the customer in both chains is the same customer AND
the order in both chains is the same.
Oddly the lineItem does not have to be the same (and would not, in the same order, be the same).
In data science elements like "lineItem" are considered constructs to sequence or differentiate elements under a common "key". they offer no further business intelligence. for instance, that corn is at lineItem 1 offers no additional business knowledge about the order or the customer.

Homonym chains:
given 2 chains
1=>2=>3=>4 ; 1=>2=>3=>5
if any of the preceding nodes (1 or 2 or 3) do not equal there counterpart (their equally named node) then node 4 and 5 reside in different chains confused by the homonym.

there are NO exceptions 
customer=>order=>lineItem=>part=corn;
customer=>order=>lineItem=>part=rice;
if the customer is different then both chains are different.
if the order is different then both chains are different.


meta-blocks:
fragments of chains can be peers or independent.
take for example the fragment [lineItem=>part]
in an equivalent chain fragment [customer=>order] each succeeding fragment (in accordance with the syntax) are peers.

so i conclude that
Given 2 sequences by name alone (meta trails) one can not differentiate peer chains from homonym chains. Because it is the meta trails alone that provide the syntax [the grammatical format] then that format (in this case XML) has to provide a means to disambiguate peer chains from homonym chains.I will further state that these are business decisions to be made.

my solution is to in all cases have the elements KRI* follow the element meta name during parsing*2.
customer-John344-order-12/12/2020-*-lineItem-1-part-part1=corn;
customer-Bill-order-12/12/2020-*-lineItem-1-part-part10=rice;

the "-*-" signifies that element lineItem is a construct.

parsing*2 - this could be accomplished in several ways:
1) adding an attribute to the element <customer KRI='John344'> OR
2) following the element name with its KRI
<customer>
    <KRI>John344</KRI>
    <order><KRI>12/12/2020</KRI>
    
KRI*: the Knowledge Reference Identifier should be a unique identifier that points to a profile of this node.  
the data purest in me has always thought that the attributes of an element ARE its signature. So I prefer solution 1.

 Somewhere is a <profile ID=‘John344’> … that defines this element like no other
 What I believe that means is that goal[1] is not enough. It would be enough for an single RDF graph but not for a transfer or meshing of graphs
 
  If we added actions to RDF like “equate” then we can combine elements from one graph to another “name@3” equates to “organization@9000”.
 One could then create a super language (Or as Carl Mattocks calls it an Owl-light like language) for meshing or transfer.  (where StratML names become the super language)

So - i need help here guys. is this an XML question with an XML solution? Though, I think an XML solution would not be complete. the KRI and the signature profile needs a business solution (IE StratML extensions) and perhaps even AIKR extensions.

thoughts?

Paul Alagna
Received on Monday, 16 March 2020 00:45:29 UTC