Re: What is a Knowledge Graph? CORRECTION from Patrick J Hayes on 2019-06-17 (semantic-web@w3.org from June 2019)

From: Patrick J Hayes <phayes@ihmc.us>
Date: Mon, 17 Jun 2019 03:24:25 -0700
To: Chris Harding <chris@lacibus.net>
CC: semantic-web <semantic-web@w3.org>, Paola Di Maio <paoladimaio10@gmail.com>, xyzscy <1047571207@qq.com>
Message-ID: <BEE9FE09-8864-45EB-AD07-AE0B9DCAB680@ihmc.us>
> On Jun 16, 2019, at 9:08 AM, Chris Harding <chris@lacibus.net> wrote:
> 
> Thanks, Pat, Bradwell, Simon, Dieter, Martynas, Marco, and Dave for your remarks!
> 
> I admit to using some terms rather loosely in my mail. Part of the difficulty that we have is that we don't yet have a well-understood theory to support use of words like "concept" and "meaning”.

Well, we do have an exact and well-understood theory of how sentences of a formal notation express facts about a world; it’s called Tarskian semantics, or model theory. This may be only part of what is meant by “meaning”, but it is the part that underlies the processes of inference that you refer to below. 
> 
> We are actually on more solid ground with "graph". Graph theory is established in mathematics. I don't claim to be an expert in that theory, but am trying to use the term "graph" in accordance with it, and am happy to defer to anyone who is an expert and feels that my use of the term is incorrect.

The use of ‘graph’ being discussed here seems to have its root in the RDF notion of ‘RDF graph’, and that is (perhaps unfortunately) /not/ the sense ued in graph theory, because RDF graphs have labels on the edges as well as the nodes.  (The nearest thing in graph theory to an RDF graph is a "labelled, directed multigraph permitting loops", but that still does not allow for labels on the edges.) Basically, forget graph theory, it just confuses things. 
> 
> Let me try to express my thought without using "concept" or "meaning". There is a practice, currently growing in popularity, of creating a graph from a set of data (often including elements expressed in human language), and using that graph to derive another set of data, again often including language elements. The derived set of data is often intended to influence human decisions, perhaps by being presented as an analysis, perhaps by being presented as an explicit recommendation. A knowledge graph is a graph used in this way. 

I’m not sure that any particular /use/ is part of the idea. 
> 
> For example, I just visited amazon.co.uk and saw, "We think you'll enjoy 'Becoming' by Michelle Obama". This may have been produced using a knowledge graph derived from data about my previous visits to the website. At any rate, this is the kind of thing that proponents of knowledge graphs claim to be able to do.

It is one of them, but there any many others. Inference - using one graph to derive another - is a pretty universal mechanism. Here is one very simple example of using inference: in the Imagesnippets tool (http://www.imagesnippets.com <http://www.imagesnippets.com/>) we mark up images with content expressed using RDF. If one labels an image of bird as a “heron”, and then later tries to retrieve images of “wading birds”, that image will be retrieved because Wikipedia’s ‘knowledge graph’ “knows” that herons are wading birds. Such class-subclass reasoning is very basic, very simple to implement, and very useful. 
> 
> I don't believe there is a single accepted way of creating a knowledge graph or of deriving data from a knowledge graph. Rather, there is a toolkit from which users of knowledge graphs ("knowledge engineers"?) can select the appropriate tools for particular purposes. Taking nouns as nodes and verbs as edges is certainly one approach. It can be made more sophisticated by using thesauri to create other edges between nouns. Another approach that I think is used is to ignore the distinction between nouns and verbs and put an edge between two word nodes if they occur in the same sentence.

I would not put that in the same category, myself. 

> Nodes and edges can be given attributes. For example, an edge between two words might have an attribute that indicates the context in which they are considered similar. 
> 
> These graphs can be represented using triples, and I believe that this is the way the current tools often do represent them.

Yes, though it would be more accurate to say that these are /in fact/ collections of triples, given the name “graph” for largely irrelevant historical reasons. The things that computers manipulate are simply large corpora of triples (or quads) organized for fast retrieval and search. 

> The tools may include some processing based on the propositional or predicate calculus. Perhaps they are less sophisticated in this respect than the semantic nets of the 70s. They may also include processing that uses methods such as statistical analysis and trained neural networks, and I stand to be corrected but don't believe these were commonly used in the earlier work.

Indeed they were not around to be used back in the 1970s. However, there has not yet been much application of machine deep learning to triple-stores, either, AFAIK. 

> 
> So it looks as though these tools do, in Dave's words, blend symbolic and statistical approaches, and they can also use machine learning.

That is a consummation devoutly to be wished, but AFAIK, it has not really happened yet to any great degree. But I would love to be corrected.

Pat


> Whether  "knowledge graph" is the paradigm that he is looking for, I'm not sure.
> 
> Patrick J Hayes wrote:
>> The idea of representing, or at least displaying, knowledge as a graphical diagram (rather then as, say, a set of sentences) has a very old history. In its modern sense it goes back at least to 1885 (C S Peirce “existential graph") and can probably be traced into medieval writings and earlier. (The Torah version of Genesis refers to a "tree of knowledge".) It has been re-invented or rediscovered many times since, and seems to blossom in public (or at least academic) discussions with a periodicity of roughly 40 years. 
>> 
>> The appeal of this idea seems to lie, in part, in the way that it makes vivid the insight that all knowledge is ‘connected’, in a way that thinking of knowledge as made up of separate sentences or ‘propositions’ fails to acknowledge. In the 1970-1980 revival (surrounding the term ‘semantic net’) there was also a widespread notion that graphs or networks were more inherently ‘graphical’ or ‘diagrammatic’ in nature (as contrasted with the purely ’symbolic’ nature of sentences), so an echo of the left/right-brain idea was added to the intellectual soup. Fortunately, this has now been largely forgotten. 
>> 
>> None of this makes any actual sense, of course, since the connectivity of the graph or network happens entirely through the co-occurrence of names in the various sentences that make up the parts of the ‘graph’. So a set of sentences (in RDF and current systems, very simple sentences comprising a single triple) is inherently ‘connected’ via the fact that sentences use the same names as other sentences. But as this is the only kind of connection that the graph/network notations can encode, the graph ’structure’ does not add anything at all to the expressiveness of the notation. It is simply a decorative way to write a bunch of sentences on a surface. 
>> 
>> The RDF standard acknowledged this insight by /defining/ an “RDF graph” to simply be a set of RDF triples, thereby keeping the ‘diagram’ terminology while allowing implementations to happily ignore it and use whatever storage and display techniques they like for large ‘graphs’ (typically, hash stores using quads). So “graph” now, in the post-RDF usage (which includes the term ‘knowledge graph’, which simply refers to Google’s way of using RDF without having to strictly conform to the RDF specifications) has come full circle to not actually mean a graphical diagram or a network, but simply as a handy word to refer to a chunk of structured knowledge, represented as triples. It basically restricts the form of sentences, nothing more. 
>> 
>> Hope this helps.
>> 
>> Pat
>> 
>> 
>> 
>>> On Jun 15, 2019, at 11:06 AM, Bradwell (US), Prachant <prachant.bradwell@boeing.com <mailto:prachant.bradwell@boeing.com>> wrote:
>>> 
>>> Through this conversation, it seems to me that the term “graph” is a confusion point. Might there be a better term to explain this to the layman? 
>>> 
>>> It is entirely possible that I need a history lesson on this too :)
>>> 
>>> Sent from my iPhone
>>> 
>>> On Jun 15, 2019, at 11:02 AM, Patrick J Hayes <phayes@ihmc.us <mailto:phayes@ihmc.us>> wrote:
>>> 
>>>> Chris,a few remarks.
>>>> 
>>>> 1. Although obviously a node-edge-node is a triple, so any (directed, labeled) graph can be treated as a set of triples, not all sets of triples can be drawn as a graphical diagram. RDF graphs (= sets of RF triples, by definition) for example can have the same label used as both a node and arc label, possibly even in the same triple. I would suggest treating the word “graph” here as a handy way to describe triple-sets and leave it at that. 
>>>> 
>>>> 2. Being ’thought of as’ something can hardly be used as a definition. I can think of a pile of grey rags as an elephant, but that doesnt make it actually be anything. 
>>>> 
>>>> 3. To speak of ‘concepts' and 'nodes representing' them is getting very blurry indeed with semantics, to the point where one loses meaning altogether. Most nodes in most K. graphs do what names in Krep notations usually do: they /denote/ /things/ (‘entities’ if you like). After working in the semantcs area for most of my career, I have no clear idea what ‘concepts’ are, but if the concept of, say, Paris is anything other than a city, then a node with the label “Paris”, intended to name the capital of France, does NOT represent a concept. 
>>>> 
>>>> 4. The notion of higher-dimensional triple is new. (Did you mean ‘higher-order’?) And can you illustrate this technique of real-valued vectors to encode them? 
>>>> 
>>>> 5. The semantic nets of the 1970s were, almost univerally, /much/ more expressive than knowledge graphs or RDF, or any of the other ‘graph’-like modern notations. They typically had ways of encoding quantifier scopes, disjunction, negation and sometimes such things as modal operators. The grandfather of them all, C.S.Peirce’s ‘existential graphs’  had the full expressivity of first-order logic in 1885 (implemented as ‘conceptual graphs’ by John Sowa about 90 years later http://www.jfsowa.com/cg/cgonto.htm <http://www.jfsowa.com/cg/cgonto.htm>). It has been downhill from there. 
>>>> 
>>>> Pat Hayes
>>>> 
>>>>> On Jun 14, 2019, at 1:31 PM, Chris Harding <chris@lacibus.net <mailto:chris@lacibus.net>> wrote:
>>>>> 
>>>>> Hi, Paola -
>>>>> 
>>>>> Interesting question! I think that graphs relate particularly to triples because node-edge-node can be represented as a triple, so a collection of triples describes a graph. 
>>>>> 
>>>>> So "a collection of triples to which someone attaches meaning" doesn't quite capture it. Maybe "a collection of triples to which someone attaches meaning and which is thought of as a graph, with the nodes representing concepts and the edges representing meaningful connections between them" would come closer?
>>>>> 
>>>>> Higher-dimension tuples can come in as embedded vectors - tuples of real numbers that cam be associated with nodes or edges of the knowledge graph to convey attribute values. There appear to be various techniques for producing these, including AI.I think it is these techniques that take us beyond "the good old semantic nets of the 70ies" - although scale is important too. 
>>>>> 
>>>>> Paola Di Maio wrote:
>>>>>> Chris
>>>>>> KG can also be any n-tuple, isnt it?
>>>>>> 
>>>>>> On Thu, Jun 13, 2019 at 6:21 PM Chris Harding <chris@lacibus.net <mailto:chris@lacibus.net>> wrote:
>>>>>> I should have said that it is a collection of triples to which someone attaches meaning. The triples might or might not be in a triple store.
>>>>>> 
>>>>>> Chris Harding wrote:
>>>>>>> What is a knowledge graph? 
>>>>>>> 
>>>>>>> I looked it up in Wikipedia, and the definition seemed to be "What Google does". Reading a bit more widely, I came to the conclusion that it is a triple store to which someone attaches meaning. (Of course, this is most, if not all, triple stores.) What is interesting is the impressive amount of theory and practice, associated with the "knowledge graph" label, for using AI and other techniques to obtain transformations or measurements of the triple stores that add to the meaning that people attach to them.
>>>>>>> 
>>>>>>> I found these articles helpful:
>>>>>>> http://ceur-ws.org/Vol-2322/dsi4-6.pdf <http://ceur-ws.org/Vol-2322/dsi4-6.pdf>
>>>>>>> https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526 <https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526>
>>>>>>> https://content.iospress.com/articles/data-science/ds007 <https://content.iospress.com/articles/data-science/ds007>
>>>>>>> 
>>>>>>> xyzscy wrote:
>>>>>>>> Thank you for your response. I think the KG term is spread by GOOGLE, while I don’t how google implement it.  I used to think the semantic network  is the key technology of KG,but google has never statement that.
>>>>>>>>> 在 2019年6月13日，下午2:46，Paola Di Maio <paola.dimaio@gmail.com <mailto:paola.dimaio@gmail.com>> 写道：
>>>>>>>>> 
>>>>>>>>> Thank you for asking this, 
>>>>>>>>> 
>>>>>>>>> I  ll leave the experts to reply to scalability and other questions
>>>>>>>>> 
>>>>>>>>> In general, much depends on the language one uses, which in turn
>>>>>>>>> depends on the domain (which planet you come from)
>>>>>>>>> 
>>>>>>>>> When I first studied knowledge engineering, the expression knowledge graph
>>>>>>>>> was not in use at all. I was doing an MSc and studied the body of knowledge
>>>>>>>>> from ESPRIT project (some folks on this list worked on it)
>>>>>>>>> https://pdfs.semanticscholar.org/193e/b66909b0c87d5dbcdbd6b20d78ed93fc95a7.pdf <https://pdfs.semanticscholar.org/193e/b66909b0c87d5dbcdbd6b20d78ed93fc95a7.pdf>  
>>>>>>>>> 
>>>>>>>>>  I d be curious to learn when such term knowledge graph came in use and who coined it 
>>>>>>>>> 
>>>>>>>>> I then heard it in relation to the SW and this list, and always tried to figure out what exactly 
>>>>>>>>> a KG is (in relation the wider Knowledge Representation domain I was studying)
>>>>>>>>> 
>>>>>>>>> Knowledge graphs are a type of knowledge representation, and they can be visualized
>>>>>>>>> graphically, or represented using algebra (again, depends on what planet you are on)
>>>>>>>>> Engineers tend to use diagrams, others tend to use algebra
>>>>>>>>> 
>>>>>>>>> But more importantly, is that they enable machine readability querying and computational manipulation of complex (combined) data sets, assuming knowledge is some kind of data in context, as some say.
>>>>>>>>> I dont use the term knowledge graph much either.  Let's see if the KG folks can offer more info
>>>>>>>>> 
>>>>>>>>> PDM
>>>>>>>>> Knowledge Graph Representation
>>>>>>>>> Knowledge graphs provide a unified format for representing knowledge about relationships between entities. A knowledge graph is a collection of triples, with each triple (h,t,r) denoting the fact that relation r exists between head entity h and tail en- tity t. http://ceur-ws.org/Vol-2322/dsi4-6.pdf <http://ceur-ws.org/Vol-2322/dsi4-6.pdf> 
>>>>>>>>> 
>>>>>>>>>  
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thu, Jun 13, 2019 at 1:40 PM 我 <1047571207@qq.com <mailto:1047571207@qq.com>> wrote:
>>>>>>>>> Dear all:
>>>>>>>>> 
>>>>>>>>> When I first touch knowledge graph, I'm very confused. Different from the other AI theory,  it is not an pattern recognization algorithm which will  give some "output" given some "input"(such as classify algorithms) ,but a program language(such as owl,rdf) and database(such as neo4j) instead. So in my opinion, knowledge graph is more like a problem of engineering than mathematic theory.  
>>>>>>>>> 
>>>>>>>>> Then I realized that different from the pattern recognization algorithm, the knowledge graph is created aimed at making the computes all over the world to  communicate with each other with a common language, and I have a question: Is scalability the key property of knowledge graph?
>>>>>>>>> 
>>>>>>>>> There are many knowledge vaults edited by different language(such as owl,rdf ),but is it always hard to merge them and there is not a standard knowledge vault  on which  we can do advanced  development. So is it necessary to open a scalable  and standard knowledge vault so that everyone can keep extended it and make it more perfect just like linux kernel or  wiki pedia? What kind of knowledge should be contained in the standard knowledge vault so that it can be universal?  I imagine that the standard knowledge vault is an originator, and all of the other application copy the originator, then all of the other application can communicate under the same common sense, for example when a application decelerate ''night", all of the other application will know it's dark. 
>>>>>>>>> 
>>>>>>>>> As I know, the knowlege graph is implement as a query service, but is it possible to implement it  as a program language,just like c++,java? In this way ,the compute can directly know nature language, and human can communicate with compute with nature language, also a compute can communicate with another compute with nature language.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Regards
>>>>>>> 
>>>>>>> Chris
>>>>>>> ++++
>>>>>>> 
>>>>>>> Chief Executive, Lacibus <https://lacibus.net/> Ltd
>>>>>>> chris@lacibus.net <mailto:chris@lacibus.net>
>>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Regards
>>>>>> 
>>>>>> Chris
>>>>>> ++++
>>>>>> 
>>>>>> Chief Executive, Lacibus <https://lacibus.net/> Ltd
>>>>>> chris@lacibus.net <mailto:chris@lacibus.net>
>>>>>> 
>>>>> 
>>>>> -- 
>>>>> Regards
>>>>> 
>>>>> Chris
>>>>> ++++
>>>>> 
>>>>> Chief Executive, Lacibus <https://lacibus.net/> Ltd
>>>>> chris@lacibus.net <mailto:chris@lacibus.net>
>>>>> 
>>>> 
>> 
> 
> -- 
> Regards
> 
> Chris
> ++++
> 
> Chief Executive, Lacibus <https://lacibus.net/> Ltd
> chris@lacibus.net <mailto:chris@lacibus.net>
>
Received on Monday, 17 June 2019 10:25:08 UTC