- From: Amirouche <amirouche@hyper.dev>
- Date: Fri, 19 Nov 2021 10:57:48 +0000
- To: "public-aikr@w3.org" <public-aikr@w3.org>
# Knowledge Graphs html: https://kgbook.org/#access github: https://github.com/Knowledge-Graphs-Book/HTML-Book/ ## Abstract This book provides a comprehensive and accessible introduction to knowledge graphs, which have recently garnered notable attention from both industry and academia. Knowledge graphs are founded on the principle of applying a graph-based abstraction to data, and are now broadly deployed in scenarios that require integrating and extracting value from multiple, diverse sources of data at large scale. The book defines knowledge graphs and provides a high-level overview of how they are used. It presents and contrasts popular graph models that are commonly used to represent data as graphs, and the languages by which they can be queried before describing how the resulting data graph can be enhanced with notions of schema, identity, and context. The book discusses how ontologies and rules can be used to encode knowledge as well as how inductive techniques — based on statistics, graph analytics, machine learning, etc. — can be used to encode and extract knowledge. It covers techniques for the creation, enrichment, assessment, and refinement of knowledge graphs and surveys recent open and enterprise knowledge graphs and the industries or applications within which they have been most widely adopted. The book closes by discussing the current limitations and future directions along which knowledge graphs are likely to evolve. This book is aimed at students, researchers, and practitioners who wish to learn more about knowledge graphs and how they facilitate extracting value from diverse data at large scale. To make the book accessible for newcomers, running examples and graphical notation are used throughout. Formal definitions and extensive references are also provided for those who opt to delve more deeply into specific topics. ## Table of Content - Preface - Acknowledgements 1. Introduction 2. Data Graphs 2.1. Models 2.2. Querying 3. Schema, Identity, Context 3.1. Schema 3.2. Identity 3.3. Context 4. Deductive Knowledge 4.1. Ontologies 4.2. Reasoning 5. Inductive Knowledge 5.1. Graph Analytics 5.2. Knowledge Graph Embeddings 5.3. Graph Neural Networks 5.4. Symbolic Learning 6. Creation and Enrichment 6.1. Human Collaboration 6.2. Text Sources 6.3. Markup Sources 6.4. Structured Sources 6.5. Schema/Ontology Creation 7. Quality Assessment 7.1. Accuracy 7.2. Coverage 7.3. Coherency 7.4. Succinctness 7.5. Other Quality Dimensions 8. Refinement 8.1. Completion 8.2. Correction 8.3. Other Refinement Tasks 9. Publication 9.1. Best Practices 9.2. Access Protocols 9.3. Usage Control 10. Knowledge Graphs in Practice 10.1. Open Knowledge Graphs 10.2. Enterprise Knowledge Graphs 11. Summary and Conclusion - Bibliography A. Background A.1. Historical Perspective A.2. “Knowledge Graphs”: Pre 2012 A.3. “Knowledge Graphs”: 2012 Onwards ## Summary and Conclusion We have provided a comprehensive introduction to knowledge graphs, which have been receiving more and more attention in recent years. Under the definition of a knowledge graph as a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent relations between these entities, we have discussed models by which data can be structured as graphs; representations of schema, identity and context; techniques for leveraging deductive and inductive knowledge; methods for the creation, enrichment, quality assessment and refinement of knowledge graphs; principles and standards for publishing knowledge graphs; and finally, we have discussed the adoption of both open and enterprise knowledge graphs in the real world. In this final chapter, we provide some concluding remarks, and further offer some insights on potential future directions for research on knowledge graphs. Concluding remarks. Knowledge graphs have garnered significant attention not only from diverse organisations and industries, but also diverse research communities. This attention is due, in no small part, to the ubiquitous nature of the problem that knowledge graphs address: integrating and extracting value from diverse sources of data at large scale, be it in the context of a particular organisation, community, or more general collections of human knowledge. The key insight of knowledge graphs is that graphs provide a simple, flexible, intuitive and yet powerful abstraction for representing and integrating diverse data at large scale. This insight is far from new (see Appendix A), but rather has finally come of age with the advent of knowledge graphs. Graphs have long been used to represent data and knowledge in areas such as Graph Algorithms and Theory, Graph Databases, Information Extraction, Knowledge Representation, Machine Learning, the Semantic Web, and more besides. The advances in these areas can now be unified and applied for knowledge graphs. Thus, the decision to model data as a graph opens up a “tool-box” of languages, techniques and systems – stemming from diverse areas – that can be deployed in order to integrate and extract value from data at large scale, as follows: - A variety of graph query languages are now available that (unlike other NoSQL alternatives) are fully-featured, supporting not only the relational algebra, but also novel features such as navigational queries that can match paths of arbitrary length. A broad selection of graph databases and user interfaces supporting these query languages are now also available. - Though graphs do not depend on a detailed (relational-like) schema to represent data, various notions of graph schemata have been proposed in order to validate, summarise and define the semantics of graphs. - Contextual frameworks for graphs can be used to represent and reason about the scope of truth of knowledge in the graph – relating to the time, space, provenance, confidence level, etc., for which something is held true – including various alternatives for reification, annotated graph frameworks, etc. - Deductive forms of reasoning can be enabled over graphs using ontologies and/or rules, which can not only encode a machine-readable consensus about the meaning of the graph, but also provide automated access to implicit knowledge entailed by a graph through materialisation or query rewriting. - Graph algorithms, such as centrality measures, community detection, clustering, etc., can be applied on the data to gain insights about influential entities or edges, close-knit sub-graphs of entities, and more besides, with graph parallel frameworks capable of applying such algorithms at large scale. - Recent and continual advances in knowledge graph embeddings and graph neural networks have now opened up new possibilities for applying machine learning natively over graphs in the context of diverse tasks, including classification, question answering, recommendations, and more besides. - Rule and axiom mining techniques allow for extracting formal, declarative hypotheses from a knowledge graph that encode high-level patterns and can be applied to derive new knowledge in a deductive, explainable manner. - Graph-based information extraction can be applied to extract and/or enrich a knowledge graph from legacy sources of text and semi-structured data, while graph-based mapping languages facilitate integrating diverse sources of legacy structured data into the knowledge graph. - Tools, techniques and methodologies for ontology engineering and ontology learning can further guide the – potentially collaborative – creation of an ontology for the knowledge graph, encoding a consensus about its semantics, and enabling access to implicit knowledge through deductive reasoning. - Quality dimensions and metrics for knowledge graphs allow for systematically assessing the readiness of the knowledge graph for its envisaged applications, in both a qualitative and quantitative manner, where a variety of tools and frameworks are available to help perform such assessments. - Knowledge graphs that have been integrated from diverse sources are likely to be incomplete, or to encode incorrect data, where techniques and tools for knowledge graph refinement facilitate the automated completion and correction of knowledge graphs, thus improving its overall quality and usefulness. - For the purposes of publishing open knowledge graphs, principles & best practices and access protocols, as well as techniques for linking, licensing, access & usage control, encryption and anonymisation, can be leveraged to maximise their potential impact on society in an ethical way. As we have discussed in Chapter 10, the various components of this “knowledge graph tool-box” can already be found deployed in practice, having been applied – to varying degrees – in the context of numerous open and enterprise knowledge graphs. As adoption of knowledge graphs continues, work will also continue on improving and combining these tools, as well as on developing novel tools that help to better integrate and extract value from diverse sources of data at large scale. Future directions. Research on knowledge graphs involves a confluence of techniques from different research areas with the common objective of maximising the knowledge – and thus value – that can be distilled from diverse sources at large scale using a graph-based data abstraction [Hogan, 2020a]. In the intersection of data graphs and deductive knowledge, we emphasise emerging topics such as formal semantics for property graphs, with languages that can take into account the meaning of labels and property–value pairs on nodes and edges [Krötzsch et al., 2018]; and reasoning and querying over contextual data, in order to derive conclusions and results valid in a particular setting [Serafini and Homola, 2012, Zimmermann et al., 2012, Schuetz et al., 2021]. In the intersection of data graphs and inductive knowledge, we highlight topics such as similarity-based query relaxation, allowing to find approximate answers to exact queries based on numerical representations (e.g., embeddings) [Wang et al., 2018]; shape induction, in order to learn and formalise inherent patterns in the knowledge graph as constraints [Mihindukulasooriya et al., 2018]; and contextual knowledge graph embeddings that provide numeric representations of nodes and edges that vary with time, place, etc. [Kazemi et al., 2019]. In the intersection of deductive and inductive knowledge, we mention the topics of entailment-aware knowledge graph embeddings [Guo et al., 2016, Demeester et al., 2016], that incorporate rules and/or ontologies when computing plausibility; expressive graph neural networks proven capable of complex classification analogous to expressive ontology languages [Barceló et al., 2020]; as well as further advances on rule and axiom mining, allowing to extract symbolic, deductive representations from the knowledge graphs [Galárraga et al., 2015, Bühmann et al., 2016]. Further challenges arise when considering the creation, enrichment, refinement and publication of knowledge graphs, which call for further works on topics such as automated quality assessment (and repair), distantly-supervised extraction frameworks, efficient access protocols, and anonymisation, to name but a few. Aside from specific topics, more general challenges for knowledge graphs include scalability, particularly for deductive and inductive reasoning; quality, not only in terms of data, but also the models induced from knowledge graphs; diversity, such as managing contextual or multi-modal data; dynamicity, considering temporal or streaming data; and finally usability, which is key to increasing adoption. Though techniques are continuously being proposed to address these challenges, they are unlikely to ever be completely “solved”; rather they serve as dimensions along which knowledge graphs, and their techniques, tools, etc., will continue to mature. Given the availability of open knowledge graphs whose quality continues to improve, as well as the growing adoption of enterprise knowledge graphs in various industries, future research on knowledge graphs has the potential to foster key advancements in broad aspects of society. Here we have highlighted just some examples of future research directions of importance to this pursuit.
Received on Friday, 19 November 2021 10:58:08 UTC