1 00:00:00,000 --> 00:00:04,580 Hello. This is Paola Di Maio 2 00:00:04,580 --> 00:00:07,290 presenting this work in my 3 00:00:07,290 --> 00:00:11,760 capacity of chair of the AIKRCG, 4 00:00:11,760 --> 00:00:14,260 which stands for Underficient 5 00:00:14,260 --> 00:00:16,950 Intelligence Knowledge Presentation 6 00:00:16,950 --> 00:00:19,800 Community Group at W3C, where a 7 00:00:19,800 --> 00:00:21,900 large part of this work is 8 00:00:21,900 --> 00:00:24,080 being done and shared. 9 00:00:24,080 --> 00:00:27,050 And, Jan Chin, thank you for 10 00:00:27,050 --> 00:00:29,290 presenting the paper at the 11 00:00:29,290 --> 00:00:31,750 conference in person in 12 00:00:31,750 --> 00:00:34,720 Barcelona. This is a pre-recorded 13 00:00:34,720 --> 00:00:37,490 talk. And this is me, I'm the 14 00:00:37,490 --> 00:00:40,260 voice. This is my face. 15 00:00:44,260 --> 00:00:46,660 The contents of the paper is 16 00:00:46,660 --> 00:00:49,480 outlined here. The main topic 17 00:00:49,480 --> 00:00:52,370 is Knowledge Representation, 18 00:00:52,370 --> 00:00:54,770 with a focus on Knowledge 19 00:00:54,770 --> 00:00:57,410 Representation Learning and the 20 00:00:57,410 --> 00:01:00,260 development of a vocabulary, 21 00:01:00,260 --> 00:01:04,340 which serves as a metadata set, 22 00:01:04,340 --> 00:01:07,520 as a type of subject index 23 00:01:07,520 --> 00:01:09,260 metadata. 24 00:01:09,260 --> 00:01:12,600 So the background is AI, which 25 00:01:12,600 --> 00:01:15,810 is enveloping everything and 26 00:01:15,810 --> 00:01:19,420 moving very fast. So it's doing 27 00:01:19,420 --> 00:01:22,620 AI is giving us unprecedented 28 00:01:22,620 --> 00:01:26,470 capabilities, you know, fields 29 00:01:26,470 --> 00:01:29,140 with a lot of open issues, 30 00:01:29,140 --> 00:01:33,260 uncertainties and risk factors. 31 00:01:33,260 --> 00:01:36,660 So AI is fundamentally rewriting 32 00:01:36,660 --> 00:01:40,180 history. So today we, we search 33 00:01:40,180 --> 00:01:43,100 for facts, an event is like a 34 00:01:43,100 --> 00:01:46,130 filter for everything that we 35 00:01:46,130 --> 00:01:49,270 know these days. It is also 36 00:01:49,270 --> 00:01:51,880 wiping our individual memory, 37 00:01:51,880 --> 00:01:54,480 like when we're forgetting to 38 00:01:54,480 --> 00:01:57,040 remember things, we are just 39 00:01:57,040 --> 00:01:59,430 now we're searching. This 40 00:01:59,430 --> 00:02:00,420 happened a little bit with the, 41 00:02:00,420 --> 00:02:01,260 with search engines. 42 00:02:01,260 --> 00:02:03,870 with search engines and with 43 00:02:03,870 --> 00:02:06,560 Google, mighty Google, already 44 00:02:06,560 --> 00:02:09,480 25 years ago. But at the 45 00:02:09,480 --> 00:02:11,200 beginning, 46 00:02:11,200 --> 00:02:13,220 Google was indexing fairly 47 00:02:13,220 --> 00:02:15,670 accurately. And now because of 48 00:02:15,670 --> 00:02:18,200 the data explosion, the volumes 49 00:02:18,200 --> 00:02:21,800 of data being produced every 50 00:02:21,800 --> 00:02:25,610 day, much of which is noise, 51 00:02:25,610 --> 00:02:29,660 but systems, search engines, 52 00:02:29,660 --> 00:02:32,610 AI systems cannot distinguish 53 00:02:32,610 --> 00:02:35,560 what is noise from signal. So 54 00:02:35,560 --> 00:02:38,260 we are having a problem that 55 00:02:38,260 --> 00:02:40,110 we are writing, we're asking 56 00:02:40,110 --> 00:02:41,810 questions to AI about reality 57 00:02:41,810 --> 00:02:43,360 and the aspects, and AI is 58 00:02:43,360 --> 00:02:43,820 doing 59 00:02:43,820 --> 00:02:45,820 an excellent job of bringing 60 00:02:45,820 --> 00:02:48,040 things up. But at the same time 61 00:02:48,040 --> 00:02:50,160 is presenting results which 62 00:02:50,160 --> 00:02:54,590 have inherent bias, which is 63 00:02:54,590 --> 00:02:57,020 contributing possibly to 64 00:02:57,020 --> 00:02:59,920 distortions in the serious 65 00:02:59,920 --> 00:03:00,400 concern. 66 00:03:01,200 --> 00:03:02,970 And AI is learning from humans, 67 00:03:02,970 --> 00:03:04,870 ingesting intelligence and 68 00:03:04,870 --> 00:03:06,800 becoming autonomous and 69 00:03:06,800 --> 00:03:09,310 it's building itself. So we 70 00:03:09,310 --> 00:03:11,610 really don't know what AI is 71 00:03:11,610 --> 00:03:15,570 becoming. And this is one of 72 00:03:15,570 --> 00:03:16,400 the factors 73 00:03:16,400 --> 00:03:18,970 that motivated this work in the 74 00:03:18,970 --> 00:03:20,840 first place and say, "We want 75 00:03:20,840 --> 00:03:22,600 to understand what AI is 76 00:03:22,600 --> 00:03:23,520 becoming and 77 00:03:24,080 --> 00:03:26,130 how it's doing things, but how 78 00:03:26,130 --> 00:03:28,200 can we do that?" And we went to 79 00:03:28,200 --> 00:03:30,740 knowledge representation. So 80 00:03:30,740 --> 00:03:31,200 knowledge 81 00:03:31,200 --> 00:03:34,450 representation has been 82 00:03:34,450 --> 00:03:38,160 considered part of AI. The 83 00:03:38,160 --> 00:03:41,920 explicit representation of 84 00:03:41,920 --> 00:03:46,000 facts and rules and 85 00:03:46,000 --> 00:03:49,330 logic within that was leveraged 86 00:03:49,330 --> 00:03:52,640 by AI for reasoning. And in the 87 00:03:52,640 --> 00:03:55,920 age of machine learning, KR, 88 00:03:55,920 --> 00:03:57,040 however, 89 00:03:58,160 --> 00:03:59,230 has become less relevant to the 90 00:03:59,230 --> 00:04:01,070 point that people are saying no 91 00:04:01,070 --> 00:04:03,310 differentiation is not relevant 92 00:04:03,310 --> 00:04:03,440 to 93 00:04:03,440 --> 00:04:05,090 what we're doing today with 94 00:04:05,090 --> 00:04:07,240 neural networks. And this is 95 00:04:07,240 --> 00:04:10,620 this arguing, disagreeing with 96 00:04:10,620 --> 00:04:12,640 the basic argument 97 00:04:13,600 --> 00:04:15,460 was a starting point for this 98 00:04:15,460 --> 00:04:17,350 work seven years ago. And now 99 00:04:17,350 --> 00:04:19,230 we are seeing that people are 100 00:04:19,230 --> 00:04:20,160 figuring out, 101 00:04:20,160 --> 00:04:21,790 after we have hammered and 102 00:04:21,790 --> 00:04:23,280 written a lot of papers and 103 00:04:23,280 --> 00:04:25,060 done quite a lot of talks about 104 00:04:25,060 --> 00:04:26,320 it, and people 105 00:04:26,320 --> 00:04:27,540 are starting looking back at 106 00:04:27,540 --> 00:04:29,130 knowledge representation saying, 107 00:04:29,130 --> 00:04:30,240 "No, we do need knowledge 108 00:04:30,240 --> 00:04:31,040 representation 109 00:04:31,680 --> 00:04:33,180 for a number of things, even in 110 00:04:33,180 --> 00:04:34,800 machine learning." 111 00:04:38,240 --> 00:04:41,770 "Nonelless, since the beginning 112 00:04:41,770 --> 00:04:45,120 of knowledge representation 113 00:04:45,120 --> 00:04:47,650 field development, it has not 114 00:04:47,650 --> 00:04:53,720 been understood or defined 115 00:04:53,720 --> 00:04:54,480 practice 116 00:04:56,880 --> 00:04:59,860 not well." So it has been used 117 00:04:59,860 --> 00:05:02,850 in a small way by selecting a 118 00:05:02,850 --> 00:05:05,620 distinct knowledge precision 119 00:05:05,620 --> 00:05:07,990 technique to achieve specific 120 00:05:07,990 --> 00:05:09,450 results in the construction of 121 00:05:09,450 --> 00:05:10,660 intelligence system for a 122 00:05:10,660 --> 00:05:11,440 number of years. 123 00:05:13,200 --> 00:05:14,870 But as a field, it has been 124 00:05:14,870 --> 00:05:16,890 challenging to define because 125 00:05:16,890 --> 00:05:18,840 it's very vast and it doesn't, 126 00:05:18,840 --> 00:05:20,640 it's not just one thing. 127 00:05:20,640 --> 00:05:23,360 It has been challenging. And 128 00:05:23,360 --> 00:05:26,810 there are papers dated 20, 30 129 00:05:26,810 --> 00:05:29,200 years ago, I don't have the 130 00:05:29,200 --> 00:05:32,320 citation in front of me, but, 131 00:05:32,320 --> 00:05:35,520 which were very clearly already 132 00:05:35,520 --> 00:05:37,140 identifying these challenges at 133 00:05:37,140 --> 00:05:39,540 the time, which was 50 years 134 00:05:39,540 --> 00:05:42,200 ago, 40 years ago, but today, 135 00:05:42,200 --> 00:05:44,670 this challenge is remaining. So 136 00:05:44,670 --> 00:05:45,070 knowledge representation as a 137 00:05:45,070 --> 00:05:45,480 field, still not defined in 138 00:05:45,480 --> 00:05:45,830 practice, it is not defined in 139 00:05:45,830 --> 00:05:46,290 practice, but today, this 140 00:05:46,290 --> 00:05:46,640 challenge is remaining. 141 00:05:46,640 --> 00:05:48,490 representation as a field still 142 00:05:48,490 --> 00:05:50,430 not defined in practice. It is 143 00:05:50,430 --> 00:05:52,020 becoming relevant to machine 144 00:05:52,020 --> 00:05:54,450 learning again. Still do not 145 00:05:54,450 --> 00:05:56,900 know exactly how to define it. 146 00:05:56,900 --> 00:05:59,140 And the work was started with 147 00:05:59,140 --> 00:05:59,480 this in 148 00:05:59,480 --> 00:06:01,090 mind saying we want to be able 149 00:06:01,090 --> 00:06:02,440 to say what is knowledge 150 00:06:02,440 --> 00:06:04,120 representation and how can we 151 00:06:04,120 --> 00:06:04,760 help us 152 00:06:04,760 --> 00:06:07,210 to solve the challenges and the 153 00:06:07,210 --> 00:06:08,740 open issues that machine 154 00:06:08,740 --> 00:06:11,040 learning is facing today. And 155 00:06:11,040 --> 00:06:11,900 we've been 156 00:06:11,900 --> 00:06:16,660 very busy since. The challenge 157 00:06:16,660 --> 00:06:20,720 for me has been to track the 158 00:06:20,720 --> 00:06:24,140 leading edge of where this 159 00:06:24,140 --> 00:06:26,000 everything is 160 00:06:26,000 --> 00:06:28,860 going. And what I'm presenting 161 00:06:28,860 --> 00:06:31,060 to you today is a little bit 162 00:06:31,060 --> 00:06:33,790 state-of-the-art the work as it 163 00:06:33,790 --> 00:06:34,520 stands 164 00:06:34,520 --> 00:06:39,650 today. So we started this 165 00:06:39,650 --> 00:06:44,610 trying to figure out knowledge 166 00:06:44,610 --> 00:06:50,900 representation as a domain. And 167 00:06:50,900 --> 00:06:53,560 what became compelling more 168 00:06:53,560 --> 00:06:56,280 recently is that certain 169 00:06:56,280 --> 00:06:58,880 the mission critical KR 170 00:06:58,880 --> 00:07:00,740 concepts that ensure the 171 00:07:00,740 --> 00:07:03,260 reliability of systems were 172 00:07:03,260 --> 00:07:05,510 completely missing in AI 173 00:07:05,510 --> 00:07:08,730 standards. In particular the 174 00:07:08,730 --> 00:07:10,550 truth preservation, which is a 175 00:07:10,550 --> 00:07:10,980 core KR concept, was noted as 176 00:07:10,980 --> 00:07:11,400 absent at the time of writing 177 00:07:11,400 --> 00:07:11,860 in all of the AI standards. Now 178 00:07:11,860 --> 00:07:12,260 you're going to ask me how did 179 00:07:12,260 --> 00:07:12,800 you figure out the standards? 180 00:07:12,800 --> 00:07:13,270 There is an initiative by the Turing 181 00:07:13,270 --> 00:07:14,280 Institute called AI Standards. 182 00:07:14,280 --> 00:07:16,750 was noted as absent at the time 183 00:07:16,750 --> 00:07:18,590 of writing in all of the AI 184 00:07:18,590 --> 00:07:20,270 standards. Now you're going to 185 00:07:20,270 --> 00:07:21,640 ask me how did you figure out 186 00:07:21,640 --> 00:07:23,220 the standards? There is an 187 00:07:23,220 --> 00:07:24,750 initiative by the Turing 188 00:07:24,750 --> 00:07:27,050 Institute called AI Standards 189 00:07:27,050 --> 00:07:29,150 Hub and it's searchable. So I 190 00:07:29,150 --> 00:07:31,020 don't know, the Turing 191 00:07:31,020 --> 00:07:33,230 Institute has been much 192 00:07:33,230 --> 00:07:35,340 criticized for a number of 193 00:07:35,340 --> 00:07:37,970 things, but praise to them for 194 00:07:37,970 --> 00:07:40,320 doing a searchable hub that 195 00:07:40,320 --> 00:07:44,260 allows to query, 196 00:07:44,260 --> 00:07:46,710 all AI standards by keyword. 197 00:07:46,710 --> 00:07:48,560 And at the time, trust 198 00:07:48,560 --> 00:07:50,950 participation was not in any of 199 00:07:50,950 --> 00:07:53,590 them. So, alarm. And then of 200 00:07:53,590 --> 00:07:56,310 course, there was some double 201 00:07:56,310 --> 00:07:58,040 checking by opening each 202 00:07:58,040 --> 00:08:00,880 standard individually and parsing 203 00:08:00,880 --> 00:08:03,250 it to make sure that the search 204 00:08:03,250 --> 00:08:05,540 engine wasn't just broken or 205 00:08:05,540 --> 00:08:08,410 missing. But, you know, this as 206 00:08:08,410 --> 00:08:09,860 far as as far as I was 207 00:08:09,860 --> 00:08:12,230 confident enough to make this 208 00:08:12,230 --> 00:08:13,780 assertions that 209 00:08:13,780 --> 00:08:15,530 the concept of truth 210 00:08:15,530 --> 00:08:17,720 preservation is absent in AI 211 00:08:17,720 --> 00:08:19,600 standards at the time of 212 00:08:19,600 --> 00:08:22,010 writing, which has been six 213 00:08:22,010 --> 00:08:24,960 points or throughout 25. So the 214 00:08:24,960 --> 00:08:26,810 lack of KR concepts in AI 215 00:08:26,810 --> 00:08:28,720 standards of certain KR 216 00:08:28,720 --> 00:08:31,630 concepts, critical KR concepts, 217 00:08:31,630 --> 00:08:34,450 such as truth preservation, is 218 00:08:34,450 --> 00:08:36,820 considered, can be considered a 219 00:08:36,820 --> 00:08:38,780 risk of AI failure. 220 00:08:38,780 --> 00:08:41,680 And if this is true, then, you 221 00:08:41,680 --> 00:08:44,570 know, these AI standards that 222 00:08:44,570 --> 00:08:47,410 are being developed may not be 223 00:08:47,410 --> 00:08:50,250 fit for purpose unless they are 224 00:08:50,250 --> 00:08:53,500 integrated with core knowledge, 225 00:08:53,500 --> 00:08:56,860 presentation concepts. So there 226 00:08:56,860 --> 00:08:59,190 are a number of risks of AI 227 00:08:59,190 --> 00:09:01,990 without KR. Opaikeness, 228 00:09:01,990 --> 00:09:04,690 transparency, inconsistency, 229 00:09:04,690 --> 00:09:06,010 increased system, which leads 230 00:09:06,010 --> 00:09:06,780 to the risk of AI. 231 00:09:06,780 --> 00:09:08,840 which lead to increased 232 00:09:08,840 --> 00:09:11,270 systemic risk and possibly 233 00:09:11,270 --> 00:09:14,420 systemic aberration, which is 234 00:09:14,420 --> 00:09:16,520 another big topic. If you're 235 00:09:16,520 --> 00:09:18,130 interested, you should be able 236 00:09:18,130 --> 00:09:19,570 to find the talk I recently 237 00:09:19,570 --> 00:09:20,780 gave on the topic. 238 00:09:20,780 --> 00:09:23,340 So, a number of papers and 239 00:09:23,340 --> 00:09:25,970 publications were written 240 00:09:25,970 --> 00:09:29,410 leading up to this work, if you're 241 00:09:29,410 --> 00:09:32,780 interested in the background. 242 00:09:32,780 --> 00:09:35,280 The scope of work presented 243 00:09:35,280 --> 00:09:37,830 here is a map of the knowledge 244 00:09:37,830 --> 00:09:41,320 domain called Artificial 245 00:09:41,320 --> 00:09:42,430 Intelligence Knowledge 246 00:09:42,430 --> 00:09:45,240 Representation, with a focus on 247 00:09:45,240 --> 00:09:48,780 knowledge presentation learning. 248 00:09:48,780 --> 00:09:50,930 The scope of work is to 249 00:09:50,930 --> 00:09:53,420 identify a list domain 250 00:09:53,420 --> 00:09:56,010 vocabulary. It's not in scope 251 00:09:56,010 --> 00:09:58,560 to build a full taxonomy or ontology 252 00:09:58,560 --> 00:10:00,900 at this time, although I'm sure 253 00:10:00,900 --> 00:10:02,670 with the right tools and 254 00:10:02,670 --> 00:10:04,780 resources we can do that. 255 00:10:04,780 --> 00:10:06,730 And it is not in scope to 256 00:10:06,730 --> 00:10:09,430 explain everything about AI, KR 257 00:10:09,430 --> 00:10:12,130 or metadata. So, for those of 258 00:10:12,130 --> 00:10:14,250 you who don't know or don't 259 00:10:14,250 --> 00:10:16,780 have the time to brush it off, 260 00:10:16,780 --> 00:10:18,490 brush it off, knowledge 261 00:10:18,490 --> 00:10:20,080 representation can be 262 00:10:20,080 --> 00:10:21,950 considered as a process or a 263 00:10:21,950 --> 00:10:24,190 method for encoding information 264 00:10:24,190 --> 00:10:26,700 in machine readable format to 265 00:10:26,700 --> 00:10:28,190 enable a machine to learn and 266 00:10:28,190 --> 00:10:29,780 act intelligent. 267 00:10:29,780 --> 00:10:31,900 Not one possible definition. 268 00:10:31,900 --> 00:10:35,500 And it uses diverse methods and 269 00:10:35,500 --> 00:10:38,410 tools to do this. So, in 270 00:10:38,410 --> 00:10:40,380 general, we're talking about 271 00:10:40,380 --> 00:10:42,880 knowledge presentation here. I 272 00:10:42,880 --> 00:10:44,780 need to emphasize that 273 00:10:44,780 --> 00:10:47,440 knowledge presentation is 274 00:10:47,440 --> 00:10:50,780 derived from KR in general. So, 275 00:10:50,780 --> 00:10:53,780 it supports reasoning. It's 276 00:10:53,780 --> 00:10:56,980 vital for explainability. It 277 00:10:56,980 --> 00:10:59,230 helps to decode the hidden 278 00:10:59,230 --> 00:11:01,870 layer. So, this is a new role 279 00:11:01,870 --> 00:11:04,850 for KR. I have a slide to 280 00:11:04,850 --> 00:11:08,780 explain this better later. 281 00:11:08,780 --> 00:11:10,110 Knowledge representation 282 00:11:10,110 --> 00:11:11,430 learning can be defined as a 283 00:11:11,430 --> 00:11:12,660 set of methods to encode 284 00:11:12,660 --> 00:11:14,660 symbolic knowledge into 285 00:11:14,660 --> 00:11:17,170 continuous vector spaces, so 286 00:11:17,170 --> 00:11:19,560 that AI systems can design and 287 00:11:19,560 --> 00:11:21,470 make predictions more 288 00:11:21,470 --> 00:11:22,780 effectively. 289 00:11:22,780 --> 00:11:26,240 Knowledge representation 290 00:11:26,240 --> 00:11:29,690 learning is about encoding 291 00:11:29,690 --> 00:11:33,260 symbolic knowledge in 292 00:11:33,260 --> 00:11:36,890 traditional AI, like rule-based 293 00:11:36,890 --> 00:11:40,940 systems or frame-based systems 294 00:11:40,940 --> 00:11:45,160 or knowledge-based systems, so 295 00:11:45,160 --> 00:11:46,780 to speak. 296 00:11:46,780 --> 00:11:48,960 But here we are translating a 297 00:11:48,960 --> 00:11:51,280 knowledge precision and it can 298 00:11:51,280 --> 00:11:53,890 help to translate the symbolic 299 00:11:53,890 --> 00:11:56,230 knowledge, the rules, the logic 300 00:11:56,230 --> 00:11:58,770 into continuous vector spaces, 301 00:11:58,770 --> 00:12:01,540 into machine learning constructs, 302 00:12:01,540 --> 00:12:02,780 so to speak. 303 00:12:02,780 --> 00:12:05,100 And it is important for a 304 00:12:05,100 --> 00:12:07,980 number of reasons, which I do 305 00:12:07,980 --> 00:12:11,260 not enumerate here, but for me, 306 00:12:11,260 --> 00:12:14,100 the work that I'm doing is, it 307 00:12:14,100 --> 00:12:17,530 connects knowledge presentation 308 00:12:17,530 --> 00:12:18,780 learning, 309 00:12:18,780 --> 00:12:21,330 and the more symbolic knowledge 310 00:12:21,330 --> 00:12:23,990 presentation. So we can see 311 00:12:23,990 --> 00:12:26,190 that knowledge presentation 312 00:12:26,190 --> 00:12:27,920 learning is somewhere in the 313 00:12:27,920 --> 00:12:32,780 middle. That's very interesting. 314 00:12:32,780 --> 00:12:35,110 So we are now mapping the wider 315 00:12:35,110 --> 00:12:37,660 knowledge presentation domain 316 00:12:37,660 --> 00:12:40,590 to see where KLR fits in. And KLR 317 00:12:40,590 --> 00:12:44,240 fits in here. This is what we 318 00:12:44,240 --> 00:12:47,090 are looking at today. This 319 00:12:47,090 --> 00:12:48,780 bigger picture is a definition, 320 00:12:48,780 --> 00:12:50,760 an attempt to define the 321 00:12:50,760 --> 00:12:53,740 knowledge representation domain 322 00:12:53,740 --> 00:12:54,780 as a whole. 323 00:12:54,780 --> 00:12:57,480 So, I say AI, because no 324 00:12:57,480 --> 00:13:00,440 representation is a field 325 00:13:00,440 --> 00:13:03,820 exists also outside AI. It can 326 00:13:03,820 --> 00:13:06,680 be used in a number of other 327 00:13:06,680 --> 00:13:10,780 fields, including legal design. 328 00:13:10,780 --> 00:13:13,580 there is a beautiful map of how 329 00:13:13,580 --> 00:13:16,780 long-representation relates to 330 00:13:16,780 --> 00:13:18,410 a number of fields, which is 331 00:13:18,410 --> 00:13:20,600 not just systems, and not just 332 00:13:20,600 --> 00:13:22,780 computer science. 333 00:13:22,780 --> 00:13:25,120 But here we're talking about KR 334 00:13:25,120 --> 00:13:27,470 for AI, and if we've started 335 00:13:27,470 --> 00:13:30,170 defining it in terms of subdomains 336 00:13:30,170 --> 00:13:31,780 or subcategories, 337 00:13:31,780 --> 00:13:36,780 starting from upper foundation, 338 00:13:36,780 --> 00:13:41,270 the existential level for what 339 00:13:41,270 --> 00:13:44,780 does AIKR consist of? 340 00:13:44,780 --> 00:13:48,410 Okay, so we use, we relate the 341 00:13:48,410 --> 00:13:50,780 knowledge-representation 342 00:13:50,780 --> 00:13:53,980 concepts to top level, as a top 343 00:13:53,980 --> 00:13:57,710 level ontology, using standard 344 00:13:57,710 --> 00:13:58,780 formalisms. 345 00:13:58,780 --> 00:14:01,340 And then here, we are looking 346 00:14:01,340 --> 00:14:04,080 at the number of domains. So we 347 00:14:04,080 --> 00:14:06,010 are saying, whatever AI is 348 00:14:06,010 --> 00:14:08,460 going to do is going to have an 349 00:14:08,460 --> 00:14:09,780 upper level, 350 00:14:09,780 --> 00:14:12,380 a foundational level, or top 351 00:14:12,380 --> 00:14:15,830 level ontology, an existential 352 00:14:15,830 --> 00:14:18,290 level, that defines the highest 353 00:14:18,290 --> 00:14:19,780 abstraction. 354 00:14:19,780 --> 00:14:21,480 And it's going to have a domain, 355 00:14:21,480 --> 00:14:23,350 an application domain. Oops, 356 00:14:23,350 --> 00:14:25,140 there is a double there. And 357 00:14:25,140 --> 00:14:28,420 the reliability of engineering 358 00:14:28,420 --> 00:14:31,650 has come in, because one of the 359 00:14:31,650 --> 00:14:32,780 biggest AI risks 360 00:14:32,780 --> 00:14:37,370 is the lack of reliability, so 361 00:14:37,370 --> 00:14:42,310 that especially generative AI, 362 00:14:42,310 --> 00:14:46,360 which is very smart, is not 363 00:14:46,360 --> 00:14:49,860 replicable. So from a systems 364 00:14:49,860 --> 00:14:52,250 reliability point of view, that 365 00:14:52,250 --> 00:14:54,410 is a problem. So I'm defining 366 00:14:54,410 --> 00:14:55,790 knowledge presentation in terms 367 00:14:55,790 --> 00:14:56,970 of reliability engineering 368 00:14:56,970 --> 00:14:58,400 somewhere else. And today I'm 369 00:14:58,400 --> 00:14:59,780 presenting this very briefly. 370 00:14:59,780 --> 00:15:04,430 this very briefly. So why are 371 00:15:04,430 --> 00:15:07,470 we doing this? It is to provide 372 00:15:07,470 --> 00:15:08,980 an index for communication and 373 00:15:08,980 --> 00:15:09,300 learning 374 00:15:09,300 --> 00:15:11,150 of the domain. And it can 375 00:15:11,150 --> 00:15:13,410 obviously support auditable, 376 00:15:13,410 --> 00:15:15,350 robust applications and it 377 00:15:15,350 --> 00:15:16,020 enables 378 00:15:16,020 --> 00:15:17,850 metadata-driven discovery and 379 00:15:17,850 --> 00:15:19,780 interoperability. So I must say 380 00:15:19,780 --> 00:15:22,290 that the word metadata, which 381 00:15:22,290 --> 00:15:22,980 is the 382 00:15:22,980 --> 00:15:24,630 keyword of interest for this 383 00:15:24,630 --> 00:15:26,720 conference, is here. Saying we're 384 00:15:26,720 --> 00:15:28,340 going to use the vocabulary 385 00:15:28,980 --> 00:15:30,940 as a metadata set for the 386 00:15:30,940 --> 00:15:33,700 subject matter domain knowledge 387 00:15:33,700 --> 00:15:36,320 presentation. So this is the 388 00:15:36,320 --> 00:15:37,060 subject 389 00:15:37,060 --> 00:15:39,760 matter domain. The vocabulary 390 00:15:39,760 --> 00:15:42,080 is going to be used as metadata 391 00:15:42,080 --> 00:15:44,350 and here it's listed as one of 392 00:15:44,350 --> 00:15:45,380 the uses. 393 00:15:46,980 --> 00:15:48,210 And very interesting is going 394 00:15:48,210 --> 00:15:50,180 to be to see how can we 395 00:15:52,900 --> 00:15:53,940 build automated monitoring. So 396 00:15:53,940 --> 00:15:55,440 how can we use it for 397 00:15:55,440 --> 00:15:58,820 evaluations of LLMs. 398 00:15:59,460 --> 00:16:02,030 in this respect. Methods, how 399 00:16:02,030 --> 00:16:06,480 do we do it? We identify subdomains, 400 00:16:06,480 --> 00:16:08,900 topics, pertinent topics. 401 00:16:08,900 --> 00:16:11,100 We identify core resources for 402 00:16:11,100 --> 00:16:13,710 each topic, for each subdomain. 403 00:16:13,710 --> 00:16:15,610 And I'm referring to these, 404 00:16:15,610 --> 00:16:16,900 these bubbles, 405 00:16:16,900 --> 00:16:19,720 subdomains. And then we extract 406 00:16:19,720 --> 00:16:22,280 key terms and concepts for each 407 00:16:22,280 --> 00:16:24,600 resource. We go around and 408 00:16:24,600 --> 00:16:26,940 round and we do a little bit of 409 00:16:26,940 --> 00:16:27,780 a doc. 410 00:16:27,780 --> 00:16:30,800 a bit creatively, so to speak. 411 00:16:30,800 --> 00:16:36,390 And with her, we extract 412 00:16:36,390 --> 00:16:37,810 concepts and terms and we try 413 00:16:37,810 --> 00:16:38,740 to clean them up, 414 00:16:39,460 --> 00:16:41,360 keep the relevant ones and the 415 00:16:41,360 --> 00:16:44,230 duplicate. And then we refine 416 00:16:44,230 --> 00:16:46,420 them via evaluations. 417 00:16:48,020 --> 00:16:50,770 So this is a general method for 418 00:16:50,770 --> 00:16:53,870 constructing core vocabulary. 419 00:16:53,870 --> 00:16:55,710 This is the slide I was 420 00:16:55,710 --> 00:16:58,500 referring to earlier, where 421 00:16:59,460 --> 00:17:00,310 traditionally knowledge 422 00:17:00,310 --> 00:17:02,430 presentation is used to encode 423 00:17:02,430 --> 00:17:05,110 logic and semantics in old-fashioned 424 00:17:05,110 --> 00:17:05,540 AI. 425 00:17:06,580 --> 00:17:08,790 But in machine learning today, 426 00:17:08,790 --> 00:17:10,790 we can use it to decode hidden 427 00:17:10,790 --> 00:17:13,080 layers. That's, I think, the 428 00:17:13,080 --> 00:17:14,830 most interesting aspect of the 429 00:17:14,830 --> 00:17:15,990 relevance of KR's machine 430 00:17:15,990 --> 00:17:17,140 learning today. 431 00:17:17,940 --> 00:17:19,480 And knowledge presentation 432 00:17:19,480 --> 00:17:21,940 learning sits somewhere here, 433 00:17:21,940 --> 00:17:24,540 together with the neuro-symbolic 434 00:17:24,540 --> 00:17:26,420 AI. Another big topic. So, 435 00:17:26,980 --> 00:17:28,690 somewhere in the knowledge 436 00:17:28,690 --> 00:17:30,710 organization, the spectrum of 437 00:17:30,710 --> 00:17:33,200 knowledge organization systems, 438 00:17:33,200 --> 00:17:33,620 this work 439 00:17:35,860 --> 00:17:36,180 stands here, but is the basis 440 00:17:36,180 --> 00:17:38,930 for whatever development and 441 00:17:38,930 --> 00:17:40,340 more structure, 442 00:17:41,540 --> 00:17:42,360 higher order development that 443 00:17:42,360 --> 00:17:44,020 we're going to follow. 444 00:17:50,260 --> 00:17:52,590 So, we started with symbolic 445 00:17:52,590 --> 00:17:55,380 logic from the old-fashioned AR. 446 00:18:01,620 --> 00:18:03,810 And we are arriving at the 447 00:18:03,810 --> 00:18:06,670 metadata set. So, we know all 448 00:18:06,670 --> 00:18:09,210 about metadata. But there are 449 00:18:09,210 --> 00:18:11,360 different types of metadata. So, 450 00:18:11,360 --> 00:18:13,110 we are looking at the metadata 451 00:18:13,110 --> 00:18:14,660 for subject indexing. 452 00:18:16,900 --> 00:18:18,530 So, the vocabulary presented 453 00:18:18,530 --> 00:18:20,080 here can be used as a metadata 454 00:18:20,080 --> 00:18:22,440 set for subject indexing of the 455 00:18:22,440 --> 00:18:24,000 domain knowledge precision 456 00:18:24,000 --> 00:18:25,940 learning. So, that's the idea. 457 00:18:28,340 --> 00:18:30,040 This is the little focus on 458 00:18:30,040 --> 00:18:32,200 truth maintenance systems. It 459 00:18:32,200 --> 00:18:34,050 was already mentioned at the 460 00:18:34,050 --> 00:18:35,060 beginning. 461 00:18:35,860 --> 00:18:36,640 So, it was originally a 462 00:18:36,640 --> 00:18:38,380 symbolic AI mechanism for 463 00:18:38,380 --> 00:18:40,470 consistency. And it tracks 464 00:18:40,470 --> 00:18:42,290 dependencies between beliefs 465 00:18:42,290 --> 00:18:43,140 and facts. 466 00:18:44,420 --> 00:18:46,960 And it revises beliefs when 467 00:18:46,960 --> 00:18:48,220 conflicts arise. And it is 468 00:18:48,220 --> 00:18:50,410 useful for hybridness symbolic 469 00:18:50,410 --> 00:18:52,820 machine learning systems. So, 470 00:18:52,820 --> 00:18:54,620 basically, it starts as a truth 471 00:18:54,620 --> 00:18:55,780 maintenance system. 472 00:18:55,780 --> 00:18:58,050 It can be rooted in the 473 00:18:58,050 --> 00:19:01,420 original symbolic AI, truth 474 00:19:01,420 --> 00:19:03,520 maintenance systems. But it is 475 00:19:03,520 --> 00:19:05,320 useful today in machine 476 00:19:05,320 --> 00:19:06,500 learning. 477 00:19:09,860 --> 00:19:11,950 It can help us to help us to... 478 00:19:11,950 --> 00:19:13,720 So, we cannot do without truth 479 00:19:13,720 --> 00:19:15,380 maintenance systems, so to 480 00:19:15,380 --> 00:19:16,020 speak. 481 00:19:16,660 --> 00:19:18,760 Even in machine learning today. 482 00:19:18,760 --> 00:19:20,980 Because it will enable the 483 00:19:20,980 --> 00:19:24,140 tracking of the dependencies. 484 00:19:24,140 --> 00:19:25,640 Nonetheless, as a concept was 485 00:19:25,640 --> 00:19:26,260 missing. 486 00:19:28,260 --> 00:19:29,710 So, it will support consistencies, 487 00:19:29,710 --> 00:19:32,230 updates, and it will ensure 488 00:19:32,230 --> 00:19:35,460 explainability and everything. 489 00:19:37,460 --> 00:19:39,180 So, finally, the vocabulary. It's 490 00:19:39,180 --> 00:19:42,500 a flat list. Definitions will 491 00:19:42,500 --> 00:19:45,770 be done later. This is just a 492 00:19:45,770 --> 00:19:48,320 list of words. And it's 493 00:19:48,320 --> 00:19:52,020 starting as benchmark for this... 494 00:19:55,460 --> 00:19:56,320 ...domain. Definition of the 495 00:19:56,320 --> 00:19:59,860 domain. At the moment, we have... 496 00:19:59,860 --> 00:20:01,520 This can be reached here. It 497 00:20:01,520 --> 00:20:02,580 should be viewable. 498 00:20:03,540 --> 00:20:07,030 But to edit. It's about 100 499 00:20:07,030 --> 00:20:09,240 terms. So, people ask me what 500 00:20:09,240 --> 00:20:11,860 are the inclusion criteria. 501 00:20:12,740 --> 00:20:16,180 But everything that seemed to 502 00:20:16,180 --> 00:20:19,920 be core concept in KRL was 503 00:20:19,920 --> 00:20:21,780 included. 504 00:20:22,420 --> 00:20:24,870 So, looking at a corpus. You 505 00:20:24,870 --> 00:20:26,940 ask me how many papers. 506 00:20:26,940 --> 00:20:29,590 Honestly, I don't remember. I 507 00:20:29,590 --> 00:20:31,780 would have to look it up. 508 00:20:36,820 --> 00:20:40,000 But certainly, there was a page. 509 00:20:40,000 --> 00:20:42,490 A very useful page on GitHub 510 00:20:42,490 --> 00:20:44,760 that hosted a number of key 511 00:20:44,760 --> 00:20:47,780 papers from key conferences. 512 00:20:48,580 --> 00:20:53,060 And this created them all. Painstakingly. 513 00:20:56,420 --> 00:20:58,360 So, it's very important to grow 514 00:20:58,360 --> 00:21:00,480 into the class research impact. 515 00:21:00,480 --> 00:21:02,350 And at the moment, we're just 516 00:21:02,350 --> 00:21:04,050 taking out the terms from the 517 00:21:04,050 --> 00:21:06,180 corpus and compiling a list. 518 00:21:07,380 --> 00:21:08,910 So, how are we saying if this 519 00:21:08,910 --> 00:21:10,500 vocabulary is good or not? 520 00:21:13,700 --> 00:21:16,530 By checking that every... We 521 00:21:16,530 --> 00:21:22,440 pick a few papers. Not randomly, 522 00:21:22,440 --> 00:21:24,500 but based on... 523 00:21:26,020 --> 00:21:28,530 ...we're looking at the paper. 524 00:21:28,530 --> 00:21:30,750 So, we're looking at new papers. 525 00:21:30,750 --> 00:21:33,280 And we say, "Is this core 526 00:21:33,280 --> 00:21:35,770 concept in this paper in the 527 00:21:35,770 --> 00:21:38,500 vocabulary or not?" And... 528 00:21:39,620 --> 00:21:42,630 So, we're looking at quaternions, 529 00:21:42,630 --> 00:21:45,020 for example. So, we figured out 530 00:21:45,020 --> 00:21:48,340 that there is a new paper on quaternions. 531 00:21:51,380 --> 00:21:50,790 And I wonder, "Is quaternion in 532 00:21:50,790 --> 00:21:53,560 our vocabulary?" And it wasn't. 533 00:21:53,560 --> 00:21:56,230 It was missing. So, we've added 534 00:21:56,230 --> 00:21:56,660 it. 535 00:21:57,380 --> 00:21:58,220 This is how the current 536 00:21:58,220 --> 00:22:00,550 evaluation is currently done. 537 00:22:00,550 --> 00:22:03,390 So, what is quaternion? Quaternions 538 00:22:03,390 --> 00:22:04,340 is... 539 00:22:04,340 --> 00:22:05,230 They're embeddings for 540 00:22:05,230 --> 00:22:06,380 knowledge, presentation, and 541 00:22:06,380 --> 00:22:07,440 learning. So, they're core 542 00:22:07,440 --> 00:22:08,580 concepts. They should be in the 543 00:22:08,580 --> 00:22:09,300 vocabulary. 544 00:22:10,340 --> 00:22:11,210 And then, you can study the 545 00:22:11,210 --> 00:22:12,340 whole thing. 546 00:22:14,820 --> 00:22:15,220 And look at the example of the 547 00:22:15,220 --> 00:22:19,140 virus models that use quaternions. 548 00:22:19,780 --> 00:22:21,190 They say, "Represent entities 549 00:22:21,190 --> 00:22:22,710 and relations in a hyper 550 00:22:22,710 --> 00:22:24,680 complex space." It's a hyper 551 00:22:24,680 --> 00:22:25,940 complex space. 552 00:22:26,900 --> 00:22:27,850 To model complex relational 553 00:22:27,850 --> 00:22:30,100 patterns. Knowledge graph. 554 00:22:30,100 --> 00:22:31,990 Goodness, mate. You wouldn't 555 00:22:31,990 --> 00:22:33,380 want to miss on that. 556 00:22:33,380 --> 00:22:35,970 So, we look at the papers. And 557 00:22:35,970 --> 00:22:39,150 we say, "Okay, we... So, this 558 00:22:39,150 --> 00:22:41,430 is how the evaluation is done 559 00:22:41,430 --> 00:22:42,900 at the moment. 560 00:22:42,900 --> 00:22:45,600 Finding the papers. Checking 561 00:22:45,600 --> 00:22:47,550 that the core terms and 562 00:22:47,550 --> 00:22:50,240 concepts in the paper is in the 563 00:22:50,240 --> 00:22:51,820 vocab. So, I'm running out of 564 00:22:51,820 --> 00:22:52,420 time. 565 00:22:52,420 --> 00:22:54,100 We're also doing evaluations 566 00:22:54,100 --> 00:22:55,970 with use cases. So, looking at 567 00:22:55,970 --> 00:22:58,890 specific use cases where 568 00:22:58,890 --> 00:22:59,860 knowledge, 569 00:22:59,860 --> 00:23:01,990 representation learning is used 570 00:23:01,990 --> 00:23:04,050 and picking terms from there. 571 00:23:04,050 --> 00:23:07,070 So, from this work, a number of 572 00:23:07,070 --> 00:23:08,180 categories 573 00:23:08,900 --> 00:23:10,570 is emerging. So, we can analyze 574 00:23:10,570 --> 00:23:12,830 and create additional abstract 575 00:23:12,830 --> 00:23:15,090 layers of abstractions from the 576 00:23:15,090 --> 00:23:16,260 vocabulary. 577 00:23:17,780 --> 00:23:21,170 So, for example, so far, we 578 00:23:21,170 --> 00:23:24,600 have identified a number of 579 00:23:24,600 --> 00:23:27,790 categories in KLR. Translation-based, 580 00:23:27,790 --> 00:23:29,390 bilinear, deep neural, 581 00:23:29,390 --> 00:23:31,760 geometric, temporal, which 582 00:23:31,760 --> 00:23:34,500 could be used as further 583 00:23:34,500 --> 00:23:37,460 structure for the vocabulary 584 00:23:37,460 --> 00:23:39,380 and future iteration. 585 00:23:40,340 --> 00:23:42,470 So, so far, we can say that the 586 00:23:42,470 --> 00:23:44,110 vocabulary is very useful 587 00:23:44,110 --> 00:23:46,190 because it just tells us what 588 00:23:46,190 --> 00:23:47,460 KLR consists of. 589 00:23:48,420 --> 00:23:50,060 It starts indexing the topic, 590 00:23:50,060 --> 00:23:53,550 the domain. At the same time, 591 00:23:53,550 --> 00:23:57,850 it's far from being complete. 592 00:23:57,850 --> 00:23:59,070 And it's probably even a little 593 00:23:59,070 --> 00:24:01,160 bit dirty. It's a little bit 594 00:24:01,160 --> 00:24:02,020 noisy. 595 00:24:03,300 --> 00:24:07,680 So, there are terms in there 596 00:24:07,680 --> 00:24:12,550 which may not be purely KLR. 597 00:24:12,550 --> 00:24:14,620 Knowledge presentation or KR. 598 00:24:14,620 --> 00:24:18,210 Could be. We need to define 599 00:24:18,210 --> 00:24:20,730 what we're going to leave them, 600 00:24:20,730 --> 00:24:23,060 but we're going to delete. 601 00:24:24,980 --> 00:24:27,100 Then, of course, this has been 602 00:24:27,100 --> 00:24:29,060 done very coarsely. A little 603 00:24:29,060 --> 00:24:31,770 bit of experimental work. Definitions 604 00:24:31,770 --> 00:24:33,060 still not done. 605 00:24:33,940 --> 00:24:34,990 Further refinement needed and 606 00:24:34,990 --> 00:24:37,240 continuing the evaluation. So, 607 00:24:37,240 --> 00:24:39,050 we're going to continue with 608 00:24:39,050 --> 00:24:40,260 the evaluation. 609 00:24:40,260 --> 00:24:43,060 We're going to expand, refine, 610 00:24:44,820 --> 00:24:46,640 develop unique definitions. We 611 00:24:46,640 --> 00:24:48,930 create abstractions, further 612 00:24:48,930 --> 00:24:50,850 abstractions, further layers of 613 00:24:50,850 --> 00:24:51,780 structures. 614 00:24:52,420 --> 00:24:54,490 We're going to contribute to 615 00:24:54,490 --> 00:24:57,160 standards development, we hope. 616 00:24:57,160 --> 00:24:59,260 And maybe build an agent to do 617 00:24:59,260 --> 00:25:00,260 this work. 618 00:25:00,260 --> 00:25:02,720 So, wouldn't it be nice if we 619 00:25:02,720 --> 00:25:04,990 could, if someone could help us 620 00:25:04,990 --> 00:25:07,180 to do the AI for doing this. 621 00:25:07,180 --> 00:25:09,540 And this is an open call. 622 00:25:10,580 --> 00:25:11,480 Should have some flashing 623 00:25:11,480 --> 00:25:13,220 lights on this line. 624 00:25:14,100 --> 00:25:17,500 We're going to talk about the 625 00:25:17,500 --> 00:25:21,050 AI for the AI. And this is the 626 00:25:21,050 --> 00:25:24,330 most important and dynamic 627 00:25:24,330 --> 00:25:27,300 aspect of this field. 628 00:25:27,300 --> 00:25:29,100 The leading edge. Standardized 629 00:25:29,100 --> 00:25:31,350 vocabulary includes explainability 630 00:25:31,350 --> 00:25:32,740 and human learning. 631 00:25:33,300 --> 00:25:34,660 It is necessary to develop 632 00:25:34,660 --> 00:25:36,740 subject matter metadata. 633 00:25:37,300 --> 00:25:39,300 Bridges gap between symbolic 634 00:25:39,300 --> 00:25:41,710 and statistical AI. Contributes 635 00:25:41,710 --> 00:25:43,860 safe and auditory AI system. 636 00:25:43,860 --> 00:25:45,750 This is our super bottom line. 637 00:25:45,750 --> 00:25:47,300 So, thank you so much. 638 00:25:48,900 --> 00:25:50,470 You can check out the vocab. 639 00:25:50,470 --> 00:25:53,920 You can join by search. Search 640 00:25:53,920 --> 00:25:56,260 for this and join. 641 00:25:56,900 --> 00:25:59,640 And you're very welcome to 642 00:25:59,640 --> 00:26:02,800 shoot some questions here or 643 00:26:02,800 --> 00:26:04,980 wherever you like. 644 00:26:05,460 --> 00:26:09,180 So, get in touch. Thank you. 645 00:26:09,180 --> 00:26:10,100 Bye.