1
00:00:00,000 --> 00:00:04,580
Hello. This is Paola Di Maio

2
00:00:04,580 --> 00:00:07,290
presenting this work in my

3
00:00:07,290 --> 00:00:11,760
capacity of chair of the AIKRCG,

4
00:00:11,760 --> 00:00:14,260
which stands for Underficient

5
00:00:14,260 --> 00:00:16,950
Intelligence Knowledge Presentation

6
00:00:16,950 --> 00:00:19,800
Community Group at W3C, where a

7
00:00:19,800 --> 00:00:21,900
large part of this work is

8
00:00:21,900 --> 00:00:24,080
being done and shared.

9
00:00:24,080 --> 00:00:27,050
And, Jan Chin, thank you for

10
00:00:27,050 --> 00:00:29,290
presenting the paper at the

11
00:00:29,290 --> 00:00:31,750
conference in person in

12
00:00:31,750 --> 00:00:34,720
Barcelona. This is a pre-recorded

13
00:00:34,720 --> 00:00:37,490
talk. And this is me, I'm the

14
00:00:37,490 --> 00:00:40,260
voice. This is my face.

15
00:00:44,260 --> 00:00:46,660
The contents of the paper is

16
00:00:46,660 --> 00:00:49,480
outlined here. The main topic

17
00:00:49,480 --> 00:00:52,370
is Knowledge Representation,

18
00:00:52,370 --> 00:00:54,770
with a focus on Knowledge

19
00:00:54,770 --> 00:00:57,410
Representation Learning and the

20
00:00:57,410 --> 00:01:00,260
development of a vocabulary,

21
00:01:00,260 --> 00:01:04,340
which serves as a metadata set,

22
00:01:04,340 --> 00:01:07,520
as a type of subject index

23
00:01:07,520 --> 00:01:09,260
metadata.

24
00:01:09,260 --> 00:01:12,600
So the background is AI, which

25
00:01:12,600 --> 00:01:15,810
is enveloping everything and

26
00:01:15,810 --> 00:01:19,420
moving very fast. So it's doing

27
00:01:19,420 --> 00:01:22,620
AI is giving us unprecedented

28
00:01:22,620 --> 00:01:26,470
capabilities, you know, fields

29
00:01:26,470 --> 00:01:29,140
with a lot of open issues,

30
00:01:29,140 --> 00:01:33,260
uncertainties and risk factors.

31
00:01:33,260 --> 00:01:36,660
So AI is fundamentally rewriting

32
00:01:36,660 --> 00:01:40,180
history. So today we, we search

33
00:01:40,180 --> 00:01:43,100
for facts, an event is like a

34
00:01:43,100 --> 00:01:46,130
filter for everything that we

35
00:01:46,130 --> 00:01:49,270
know these days. It is also

36
00:01:49,270 --> 00:01:51,880
wiping our individual memory,

37
00:01:51,880 --> 00:01:54,480
like when we're forgetting to

38
00:01:54,480 --> 00:01:57,040
remember things, we are just

39
00:01:57,040 --> 00:01:59,430
now we're searching. This

40
00:01:59,430 --> 00:02:00,420
happened a little bit with the,

41
00:02:00,420 --> 00:02:01,260
with search engines.

42
00:02:01,260 --> 00:02:03,870
with search engines and with

43
00:02:03,870 --> 00:02:06,560
Google, mighty Google, already

44
00:02:06,560 --> 00:02:09,480
25 years ago. But at the

45
00:02:09,480 --> 00:02:11,200
beginning,

46
00:02:11,200 --> 00:02:13,220
Google was indexing fairly

47
00:02:13,220 --> 00:02:15,670
accurately. And now because of

48
00:02:15,670 --> 00:02:18,200
the data explosion, the volumes

49
00:02:18,200 --> 00:02:21,800
of data being produced every

50
00:02:21,800 --> 00:02:25,610
day, much of which is noise,

51
00:02:25,610 --> 00:02:29,660
but systems, search engines,

52
00:02:29,660 --> 00:02:32,610
AI systems cannot distinguish

53
00:02:32,610 --> 00:02:35,560
what is noise from signal. So

54
00:02:35,560 --> 00:02:38,260
we are having a problem that

55
00:02:38,260 --> 00:02:40,110
we are writing, we're asking

56
00:02:40,110 --> 00:02:41,810
questions to AI about reality

57
00:02:41,810 --> 00:02:43,360
and the aspects, and AI is

58
00:02:43,360 --> 00:02:43,820
doing

59
00:02:43,820 --> 00:02:45,820
an excellent job of bringing

60
00:02:45,820 --> 00:02:48,040
things up. But at the same time

61
00:02:48,040 --> 00:02:50,160
is presenting results which

62
00:02:50,160 --> 00:02:54,590
have inherent bias, which is

63
00:02:54,590 --> 00:02:57,020
contributing possibly to

64
00:02:57,020 --> 00:02:59,920
distortions in the serious

65
00:02:59,920 --> 00:03:00,400
concern.

66
00:03:01,200 --> 00:03:02,970
And AI is learning from humans,

67
00:03:02,970 --> 00:03:04,870
ingesting intelligence and

68
00:03:04,870 --> 00:03:06,800
becoming autonomous and

69
00:03:06,800 --> 00:03:09,310
it's building itself. So we

70
00:03:09,310 --> 00:03:11,610
really don't know what AI is

71
00:03:11,610 --> 00:03:15,570
becoming. And this is one of

72
00:03:15,570 --> 00:03:16,400
the factors

73
00:03:16,400 --> 00:03:18,970
that motivated this work in the

74
00:03:18,970 --> 00:03:20,840
first place and say, "We want

75
00:03:20,840 --> 00:03:22,600
to understand what AI is

76
00:03:22,600 --> 00:03:23,520
becoming and

77
00:03:24,080 --> 00:03:26,130
how it's doing things, but how

78
00:03:26,130 --> 00:03:28,200
can we do that?" And we went to

79
00:03:28,200 --> 00:03:30,740
knowledge representation. So

80
00:03:30,740 --> 00:03:31,200
knowledge

81
00:03:31,200 --> 00:03:34,450
representation has been

82
00:03:34,450 --> 00:03:38,160
considered part of AI. The

83
00:03:38,160 --> 00:03:41,920
explicit representation of

84
00:03:41,920 --> 00:03:46,000
facts and rules and

85
00:03:46,000 --> 00:03:49,330
logic within that was leveraged

86
00:03:49,330 --> 00:03:52,640
by AI for reasoning. And in the

87
00:03:52,640 --> 00:03:55,920
age of machine learning, KR,

88
00:03:55,920 --> 00:03:57,040
however,

89
00:03:58,160 --> 00:03:59,230
has become less relevant to the

90
00:03:59,230 --> 00:04:01,070
point that people are saying no

91
00:04:01,070 --> 00:04:03,310
differentiation is not relevant

92
00:04:03,310 --> 00:04:03,440
to

93
00:04:03,440 --> 00:04:05,090
what we're doing today with

94
00:04:05,090 --> 00:04:07,240
neural networks. And this is

95
00:04:07,240 --> 00:04:10,620
this arguing, disagreeing with

96
00:04:10,620 --> 00:04:12,640
the basic argument

97
00:04:13,600 --> 00:04:15,460
was a starting point for this

98
00:04:15,460 --> 00:04:17,350
work seven years ago. And now

99
00:04:17,350 --> 00:04:19,230
we are seeing that people are

100
00:04:19,230 --> 00:04:20,160
figuring out,

101
00:04:20,160 --> 00:04:21,790
after we have hammered and

102
00:04:21,790 --> 00:04:23,280
written a lot of papers and

103
00:04:23,280 --> 00:04:25,060
done quite a lot of talks about

104
00:04:25,060 --> 00:04:26,320
it, and people

105
00:04:26,320 --> 00:04:27,540
are starting looking back at

106
00:04:27,540 --> 00:04:29,130
knowledge representation saying,

107
00:04:29,130 --> 00:04:30,240
"No, we do need knowledge

108
00:04:30,240 --> 00:04:31,040
representation

109
00:04:31,680 --> 00:04:33,180
for a number of things, even in

110
00:04:33,180 --> 00:04:34,800
machine learning."

111
00:04:38,240 --> 00:04:41,770
"Nonelless, since the beginning

112
00:04:41,770 --> 00:04:45,120
of knowledge representation

113
00:04:45,120 --> 00:04:47,650
field development, it has not

114
00:04:47,650 --> 00:04:53,720
been understood or defined

115
00:04:53,720 --> 00:04:54,480
practice

116
00:04:56,880 --> 00:04:59,860
not well." So it has been used

117
00:04:59,860 --> 00:05:02,850
in a small way by selecting a

118
00:05:02,850 --> 00:05:05,620
distinct knowledge precision

119
00:05:05,620 --> 00:05:07,990
technique to achieve specific

120
00:05:07,990 --> 00:05:09,450
results in the construction of

121
00:05:09,450 --> 00:05:10,660
intelligence system for a

122
00:05:10,660 --> 00:05:11,440
number of years.

123
00:05:13,200 --> 00:05:14,870
But as a field, it has been

124
00:05:14,870 --> 00:05:16,890
challenging to define because

125
00:05:16,890 --> 00:05:18,840
it's very vast and it doesn't,

126
00:05:18,840 --> 00:05:20,640
it's not just one thing.

127
00:05:20,640 --> 00:05:23,360
It has been challenging. And

128
00:05:23,360 --> 00:05:26,810
there are papers dated 20, 30

129
00:05:26,810 --> 00:05:29,200
years ago, I don't have the

130
00:05:29,200 --> 00:05:32,320
citation in front of me, but,

131
00:05:32,320 --> 00:05:35,520
which were very clearly already

132
00:05:35,520 --> 00:05:37,140
identifying these challenges at

133
00:05:37,140 --> 00:05:39,540
the time, which was 50 years

134
00:05:39,540 --> 00:05:42,200
ago, 40 years ago, but today,

135
00:05:42,200 --> 00:05:44,670
this challenge is remaining. So

136
00:05:44,670 --> 00:05:45,070
knowledge representation as a

137
00:05:45,070 --> 00:05:45,480
field, still not defined in

138
00:05:45,480 --> 00:05:45,830
practice, it is not defined in

139
00:05:45,830 --> 00:05:46,290
practice, but today, this

140
00:05:46,290 --> 00:05:46,640
challenge is remaining.

141
00:05:46,640 --> 00:05:48,490
representation as a field still

142
00:05:48,490 --> 00:05:50,430
not defined in practice. It is

143
00:05:50,430 --> 00:05:52,020
becoming relevant to machine

144
00:05:52,020 --> 00:05:54,450
learning again. Still do not

145
00:05:54,450 --> 00:05:56,900
know exactly how to define it.

146
00:05:56,900 --> 00:05:59,140
And the work was started with

147
00:05:59,140 --> 00:05:59,480
this in

148
00:05:59,480 --> 00:06:01,090
mind saying we want to be able

149
00:06:01,090 --> 00:06:02,440
to say what is knowledge

150
00:06:02,440 --> 00:06:04,120
representation and how can we

151
00:06:04,120 --> 00:06:04,760
help us

152
00:06:04,760 --> 00:06:07,210
to solve the challenges and the

153
00:06:07,210 --> 00:06:08,740
open issues that machine

154
00:06:08,740 --> 00:06:11,040
learning is facing today. And

155
00:06:11,040 --> 00:06:11,900
we've been

156
00:06:11,900 --> 00:06:16,660
very busy since. The challenge

157
00:06:16,660 --> 00:06:20,720
for me has been to track the

158
00:06:20,720 --> 00:06:24,140
leading edge of where this

159
00:06:24,140 --> 00:06:26,000
everything is

160
00:06:26,000 --> 00:06:28,860
going. And what I'm presenting

161
00:06:28,860 --> 00:06:31,060
to you today is a little bit

162
00:06:31,060 --> 00:06:33,790
state-of-the-art the work as it

163
00:06:33,790 --> 00:06:34,520
stands

164
00:06:34,520 --> 00:06:39,650
today. So we started this

165
00:06:39,650 --> 00:06:44,610
trying to figure out knowledge

166
00:06:44,610 --> 00:06:50,900
representation as a domain. And

167
00:06:50,900 --> 00:06:53,560
what became compelling more

168
00:06:53,560 --> 00:06:56,280
recently is that certain

169
00:06:56,280 --> 00:06:58,880
the mission critical KR

170
00:06:58,880 --> 00:07:00,740
concepts that ensure the

171
00:07:00,740 --> 00:07:03,260
reliability of systems were

172
00:07:03,260 --> 00:07:05,510
completely missing in AI

173
00:07:05,510 --> 00:07:08,730
standards. In particular the

174
00:07:08,730 --> 00:07:10,550
truth preservation, which is a

175
00:07:10,550 --> 00:07:10,980
core KR concept, was noted as

176
00:07:10,980 --> 00:07:11,400
absent at the time of writing

177
00:07:11,400 --> 00:07:11,860
in all of the AI standards. Now

178
00:07:11,860 --> 00:07:12,260
you're going to ask me how did

179
00:07:12,260 --> 00:07:12,800
you figure out the standards?

180
00:07:12,800 --> 00:07:13,270
There is an initiative by the Turing

181
00:07:13,270 --> 00:07:14,280
Institute called AI Standards.

182
00:07:14,280 --> 00:07:16,750
was noted as absent at the time

183
00:07:16,750 --> 00:07:18,590
of writing in all of the AI

184
00:07:18,590 --> 00:07:20,270
standards. Now you're going to

185
00:07:20,270 --> 00:07:21,640
ask me how did you figure out

186
00:07:21,640 --> 00:07:23,220
the standards? There is an

187
00:07:23,220 --> 00:07:24,750
initiative by the Turing

188
00:07:24,750 --> 00:07:27,050
Institute called AI Standards

189
00:07:27,050 --> 00:07:29,150
Hub and it's searchable. So I

190
00:07:29,150 --> 00:07:31,020
don't know, the Turing

191
00:07:31,020 --> 00:07:33,230
Institute has been much

192
00:07:33,230 --> 00:07:35,340
criticized for a number of

193
00:07:35,340 --> 00:07:37,970
things, but praise to them for

194
00:07:37,970 --> 00:07:40,320
doing a searchable hub that

195
00:07:40,320 --> 00:07:44,260
allows to query,

196
00:07:44,260 --> 00:07:46,710
all AI standards by keyword.

197
00:07:46,710 --> 00:07:48,560
And at the time, trust

198
00:07:48,560 --> 00:07:50,950
participation was not in any of

199
00:07:50,950 --> 00:07:53,590
them. So, alarm. And then of

200
00:07:53,590 --> 00:07:56,310
course, there was some double

201
00:07:56,310 --> 00:07:58,040
checking by opening each

202
00:07:58,040 --> 00:08:00,880
standard individually and parsing

203
00:08:00,880 --> 00:08:03,250
it to make sure that the search

204
00:08:03,250 --> 00:08:05,540
engine wasn't just broken or

205
00:08:05,540 --> 00:08:08,410
missing. But, you know, this as

206
00:08:08,410 --> 00:08:09,860
far as as far as I was

207
00:08:09,860 --> 00:08:12,230
confident enough to make this

208
00:08:12,230 --> 00:08:13,780
assertions that

209
00:08:13,780 --> 00:08:15,530
the concept of truth

210
00:08:15,530 --> 00:08:17,720
preservation is absent in AI

211
00:08:17,720 --> 00:08:19,600
standards at the time of

212
00:08:19,600 --> 00:08:22,010
writing, which has been six

213
00:08:22,010 --> 00:08:24,960
points or throughout 25. So the

214
00:08:24,960 --> 00:08:26,810
lack of KR concepts in AI

215
00:08:26,810 --> 00:08:28,720
standards of certain KR

216
00:08:28,720 --> 00:08:31,630
concepts, critical KR concepts,

217
00:08:31,630 --> 00:08:34,450
such as truth preservation, is

218
00:08:34,450 --> 00:08:36,820
considered, can be considered a

219
00:08:36,820 --> 00:08:38,780
risk of AI failure.

220
00:08:38,780 --> 00:08:41,680
And if this is true, then, you

221
00:08:41,680 --> 00:08:44,570
know, these AI standards that

222
00:08:44,570 --> 00:08:47,410
are being developed may not be

223
00:08:47,410 --> 00:08:50,250
fit for purpose unless they are

224
00:08:50,250 --> 00:08:53,500
integrated with core knowledge,

225
00:08:53,500 --> 00:08:56,860
presentation concepts. So there

226
00:08:56,860 --> 00:08:59,190
are a number of risks of AI

227
00:08:59,190 --> 00:09:01,990
without KR. Opaikeness,

228
00:09:01,990 --> 00:09:04,690
transparency, inconsistency,

229
00:09:04,690 --> 00:09:06,010
increased system, which leads

230
00:09:06,010 --> 00:09:06,780
to the risk of AI.

231
00:09:06,780 --> 00:09:08,840
which lead to increased

232
00:09:08,840 --> 00:09:11,270
systemic risk and possibly

233
00:09:11,270 --> 00:09:14,420
systemic aberration, which is

234
00:09:14,420 --> 00:09:16,520
another big topic. If you're

235
00:09:16,520 --> 00:09:18,130
interested, you should be able

236
00:09:18,130 --> 00:09:19,570
to find the talk I recently

237
00:09:19,570 --> 00:09:20,780
gave on the topic.

238
00:09:20,780 --> 00:09:23,340
So, a number of papers and

239
00:09:23,340 --> 00:09:25,970
publications were written

240
00:09:25,970 --> 00:09:29,410
leading up to this work, if you're

241
00:09:29,410 --> 00:09:32,780
interested in the background.

242
00:09:32,780 --> 00:09:35,280
The scope of work presented

243
00:09:35,280 --> 00:09:37,830
here is a map of the knowledge

244
00:09:37,830 --> 00:09:41,320
domain called Artificial

245
00:09:41,320 --> 00:09:42,430
Intelligence Knowledge

246
00:09:42,430 --> 00:09:45,240
Representation, with a focus on

247
00:09:45,240 --> 00:09:48,780
knowledge presentation learning.

248
00:09:48,780 --> 00:09:50,930
The scope of work is to

249
00:09:50,930 --> 00:09:53,420
identify a list domain

250
00:09:53,420 --> 00:09:56,010
vocabulary. It's not in scope

251
00:09:56,010 --> 00:09:58,560
to build a full taxonomy or ontology

252
00:09:58,560 --> 00:10:00,900
at this time, although I'm sure

253
00:10:00,900 --> 00:10:02,670
with the right tools and

254
00:10:02,670 --> 00:10:04,780
resources we can do that.

255
00:10:04,780 --> 00:10:06,730
And it is not in scope to

256
00:10:06,730 --> 00:10:09,430
explain everything about AI, KR

257
00:10:09,430 --> 00:10:12,130
or metadata. So, for those of

258
00:10:12,130 --> 00:10:14,250
you who don't know or don't

259
00:10:14,250 --> 00:10:16,780
have the time to brush it off,

260
00:10:16,780 --> 00:10:18,490
brush it off, knowledge

261
00:10:18,490 --> 00:10:20,080
representation can be

262
00:10:20,080 --> 00:10:21,950
considered as a process or a

263
00:10:21,950 --> 00:10:24,190
method for encoding information

264
00:10:24,190 --> 00:10:26,700
in machine readable format to

265
00:10:26,700 --> 00:10:28,190
enable a machine to learn and

266
00:10:28,190 --> 00:10:29,780
act intelligent.

267
00:10:29,780 --> 00:10:31,900
Not one possible definition.

268
00:10:31,900 --> 00:10:35,500
And it uses diverse methods and

269
00:10:35,500 --> 00:10:38,410
tools to do this. So, in

270
00:10:38,410 --> 00:10:40,380
general, we're talking about

271
00:10:40,380 --> 00:10:42,880
knowledge presentation here. I

272
00:10:42,880 --> 00:10:44,780
need to emphasize that

273
00:10:44,780 --> 00:10:47,440
knowledge presentation is

274
00:10:47,440 --> 00:10:50,780
derived from KR in general. So,

275
00:10:50,780 --> 00:10:53,780
it supports reasoning. It's

276
00:10:53,780 --> 00:10:56,980
vital for explainability. It

277
00:10:56,980 --> 00:10:59,230
helps to decode the hidden

278
00:10:59,230 --> 00:11:01,870
layer. So, this is a new role

279
00:11:01,870 --> 00:11:04,850
for KR. I have a slide to

280
00:11:04,850 --> 00:11:08,780
explain this better later.

281
00:11:08,780 --> 00:11:10,110
Knowledge representation

282
00:11:10,110 --> 00:11:11,430
learning can be defined as a

283
00:11:11,430 --> 00:11:12,660
set of methods to encode

284
00:11:12,660 --> 00:11:14,660
symbolic knowledge into

285
00:11:14,660 --> 00:11:17,170
continuous vector spaces, so

286
00:11:17,170 --> 00:11:19,560
that AI systems can design and

287
00:11:19,560 --> 00:11:21,470
make predictions more

288
00:11:21,470 --> 00:11:22,780
effectively.

289
00:11:22,780 --> 00:11:26,240
Knowledge representation

290
00:11:26,240 --> 00:11:29,690
learning is about encoding

291
00:11:29,690 --> 00:11:33,260
symbolic knowledge in

292
00:11:33,260 --> 00:11:36,890
traditional AI, like rule-based

293
00:11:36,890 --> 00:11:40,940
systems or frame-based systems

294
00:11:40,940 --> 00:11:45,160
or knowledge-based systems, so

295
00:11:45,160 --> 00:11:46,780
to speak.

296
00:11:46,780 --> 00:11:48,960
But here we are translating a

297
00:11:48,960 --> 00:11:51,280
knowledge precision and it can

298
00:11:51,280 --> 00:11:53,890
help to translate the symbolic

299
00:11:53,890 --> 00:11:56,230
knowledge, the rules, the logic

300
00:11:56,230 --> 00:11:58,770
into continuous vector spaces,

301
00:11:58,770 --> 00:12:01,540
into machine learning constructs,

302
00:12:01,540 --> 00:12:02,780
so to speak.

303
00:12:02,780 --> 00:12:05,100
And it is important for a

304
00:12:05,100 --> 00:12:07,980
number of reasons, which I do

305
00:12:07,980 --> 00:12:11,260
not enumerate here, but for me,

306
00:12:11,260 --> 00:12:14,100
the work that I'm doing is, it

307
00:12:14,100 --> 00:12:17,530
connects knowledge presentation

308
00:12:17,530 --> 00:12:18,780
learning,

309
00:12:18,780 --> 00:12:21,330
and the more symbolic knowledge

310
00:12:21,330 --> 00:12:23,990
presentation. So we can see

311
00:12:23,990 --> 00:12:26,190
that knowledge presentation

312
00:12:26,190 --> 00:12:27,920
learning is somewhere in the

313
00:12:27,920 --> 00:12:32,780
middle. That's very interesting.

314
00:12:32,780 --> 00:12:35,110
So we are now mapping the wider

315
00:12:35,110 --> 00:12:37,660
knowledge presentation domain

316
00:12:37,660 --> 00:12:40,590
to see where KLR fits in. And KLR

317
00:12:40,590 --> 00:12:44,240
fits in here. This is what we

318
00:12:44,240 --> 00:12:47,090
are looking at today. This

319
00:12:47,090 --> 00:12:48,780
bigger picture is a definition,

320
00:12:48,780 --> 00:12:50,760
an attempt to define the

321
00:12:50,760 --> 00:12:53,740
knowledge representation domain

322
00:12:53,740 --> 00:12:54,780
as a whole.

323
00:12:54,780 --> 00:12:57,480
So, I say AI, because no

324
00:12:57,480 --> 00:13:00,440
representation is a field

325
00:13:00,440 --> 00:13:03,820
exists also outside AI. It can

326
00:13:03,820 --> 00:13:06,680
be used in a number of other

327
00:13:06,680 --> 00:13:10,780
fields, including legal design.

328
00:13:10,780 --> 00:13:13,580
there is a beautiful map of how

329
00:13:13,580 --> 00:13:16,780
long-representation relates to

330
00:13:16,780 --> 00:13:18,410
a number of fields, which is

331
00:13:18,410 --> 00:13:20,600
not just systems, and not just

332
00:13:20,600 --> 00:13:22,780
computer science.

333
00:13:22,780 --> 00:13:25,120
But here we're talking about KR

334
00:13:25,120 --> 00:13:27,470
for AI, and if we've started

335
00:13:27,470 --> 00:13:30,170
defining it in terms of subdomains

336
00:13:30,170 --> 00:13:31,780
or subcategories,

337
00:13:31,780 --> 00:13:36,780
starting from upper foundation,

338
00:13:36,780 --> 00:13:41,270
the existential level for what

339
00:13:41,270 --> 00:13:44,780
does AIKR consist of?

340
00:13:44,780 --> 00:13:48,410
Okay, so we use, we relate the

341
00:13:48,410 --> 00:13:50,780
knowledge-representation

342
00:13:50,780 --> 00:13:53,980
concepts to top level, as a top

343
00:13:53,980 --> 00:13:57,710
level ontology, using standard

344
00:13:57,710 --> 00:13:58,780
formalisms.

345
00:13:58,780 --> 00:14:01,340
And then here, we are looking

346
00:14:01,340 --> 00:14:04,080
at the number of domains. So we

347
00:14:04,080 --> 00:14:06,010
are saying, whatever AI is

348
00:14:06,010 --> 00:14:08,460
going to do is going to have an

349
00:14:08,460 --> 00:14:09,780
upper level,

350
00:14:09,780 --> 00:14:12,380
a foundational level, or top

351
00:14:12,380 --> 00:14:15,830
level ontology, an existential

352
00:14:15,830 --> 00:14:18,290
level, that defines the highest

353
00:14:18,290 --> 00:14:19,780
abstraction.

354
00:14:19,780 --> 00:14:21,480
And it's going to have a domain,

355
00:14:21,480 --> 00:14:23,350
an application domain. Oops,

356
00:14:23,350 --> 00:14:25,140
there is a double there. And

357
00:14:25,140 --> 00:14:28,420
the reliability of engineering

358
00:14:28,420 --> 00:14:31,650
has come in, because one of the

359
00:14:31,650 --> 00:14:32,780
biggest AI risks

360
00:14:32,780 --> 00:14:37,370
is the lack of reliability, so

361
00:14:37,370 --> 00:14:42,310
that especially generative AI,

362
00:14:42,310 --> 00:14:46,360
which is very smart, is not

363
00:14:46,360 --> 00:14:49,860
replicable. So from a systems

364
00:14:49,860 --> 00:14:52,250
reliability point of view, that

365
00:14:52,250 --> 00:14:54,410
is a problem. So I'm defining

366
00:14:54,410 --> 00:14:55,790
knowledge presentation in terms

367
00:14:55,790 --> 00:14:56,970
of reliability engineering

368
00:14:56,970 --> 00:14:58,400
somewhere else. And today I'm

369
00:14:58,400 --> 00:14:59,780
presenting this very briefly.

370
00:14:59,780 --> 00:15:04,430
this very briefly. So why are

371
00:15:04,430 --> 00:15:07,470
we doing this? It is to provide

372
00:15:07,470 --> 00:15:08,980
an index for communication and

373
00:15:08,980 --> 00:15:09,300
learning

374
00:15:09,300 --> 00:15:11,150
of the domain. And it can

375
00:15:11,150 --> 00:15:13,410
obviously support auditable,

376
00:15:13,410 --> 00:15:15,350
robust applications and it

377
00:15:15,350 --> 00:15:16,020
enables

378
00:15:16,020 --> 00:15:17,850
metadata-driven discovery and

379
00:15:17,850 --> 00:15:19,780
interoperability. So I must say

380
00:15:19,780 --> 00:15:22,290
that the word metadata, which

381
00:15:22,290 --> 00:15:22,980
is the

382
00:15:22,980 --> 00:15:24,630
keyword of interest for this

383
00:15:24,630 --> 00:15:26,720
conference, is here. Saying we're

384
00:15:26,720 --> 00:15:28,340
going to use the vocabulary

385
00:15:28,980 --> 00:15:30,940
as a metadata set for the

386
00:15:30,940 --> 00:15:33,700
subject matter domain knowledge

387
00:15:33,700 --> 00:15:36,320
presentation. So this is the

388
00:15:36,320 --> 00:15:37,060
subject

389
00:15:37,060 --> 00:15:39,760
matter domain. The vocabulary

390
00:15:39,760 --> 00:15:42,080
is going to be used as metadata

391
00:15:42,080 --> 00:15:44,350
and here it's listed as one of

392
00:15:44,350 --> 00:15:45,380
the uses.

393
00:15:46,980 --> 00:15:48,210
And very interesting is going

394
00:15:48,210 --> 00:15:50,180
to be to see how can we

395
00:15:52,900 --> 00:15:53,940
build automated monitoring. So

396
00:15:53,940 --> 00:15:55,440
how can we use it for

397
00:15:55,440 --> 00:15:58,820
evaluations of LLMs.

398
00:15:59,460 --> 00:16:02,030
in this respect. Methods, how

399
00:16:02,030 --> 00:16:06,480
do we do it? We identify subdomains,

400
00:16:06,480 --> 00:16:08,900
topics, pertinent topics.

401
00:16:08,900 --> 00:16:11,100
We identify core resources for

402
00:16:11,100 --> 00:16:13,710
each topic, for each subdomain.

403
00:16:13,710 --> 00:16:15,610
And I'm referring to these,

404
00:16:15,610 --> 00:16:16,900
these bubbles,

405
00:16:16,900 --> 00:16:19,720
subdomains. And then we extract

406
00:16:19,720 --> 00:16:22,280
key terms and concepts for each

407
00:16:22,280 --> 00:16:24,600
resource. We go around and

408
00:16:24,600 --> 00:16:26,940
round and we do a little bit of

409
00:16:26,940 --> 00:16:27,780
a doc.

410
00:16:27,780 --> 00:16:30,800
a bit creatively, so to speak.

411
00:16:30,800 --> 00:16:36,390
And with her, we extract

412
00:16:36,390 --> 00:16:37,810
concepts and terms and we try

413
00:16:37,810 --> 00:16:38,740
to clean them up,

414
00:16:39,460 --> 00:16:41,360
keep the relevant ones and the

415
00:16:41,360 --> 00:16:44,230
duplicate. And then we refine

416
00:16:44,230 --> 00:16:46,420
them via evaluations.

417
00:16:48,020 --> 00:16:50,770
So this is a general method for

418
00:16:50,770 --> 00:16:53,870
constructing core vocabulary.

419
00:16:53,870 --> 00:16:55,710
This is the slide I was

420
00:16:55,710 --> 00:16:58,500
referring to earlier, where

421
00:16:59,460 --> 00:17:00,310
traditionally knowledge

422
00:17:00,310 --> 00:17:02,430
presentation is used to encode

423
00:17:02,430 --> 00:17:05,110
logic and semantics in old-fashioned

424
00:17:05,110 --> 00:17:05,540
AI.

425
00:17:06,580 --> 00:17:08,790
But in machine learning today,

426
00:17:08,790 --> 00:17:10,790
we can use it to decode hidden

427
00:17:10,790 --> 00:17:13,080
layers. That's, I think, the

428
00:17:13,080 --> 00:17:14,830
most interesting aspect of the

429
00:17:14,830 --> 00:17:15,990
relevance of KR's machine

430
00:17:15,990 --> 00:17:17,140
learning today.

431
00:17:17,940 --> 00:17:19,480
And knowledge presentation

432
00:17:19,480 --> 00:17:21,940
learning sits somewhere here,

433
00:17:21,940 --> 00:17:24,540
together with the neuro-symbolic

434
00:17:24,540 --> 00:17:26,420
AI. Another big topic. So,

435
00:17:26,980 --> 00:17:28,690
somewhere in the knowledge

436
00:17:28,690 --> 00:17:30,710
organization, the spectrum of

437
00:17:30,710 --> 00:17:33,200
knowledge organization systems,

438
00:17:33,200 --> 00:17:33,620
this work

439
00:17:35,860 --> 00:17:36,180
stands here, but is the basis

440
00:17:36,180 --> 00:17:38,930
for whatever development and

441
00:17:38,930 --> 00:17:40,340
more structure,

442
00:17:41,540 --> 00:17:42,360
higher order development that

443
00:17:42,360 --> 00:17:44,020
we're going to follow.

444
00:17:50,260 --> 00:17:52,590
So, we started with symbolic

445
00:17:52,590 --> 00:17:55,380
logic from the old-fashioned AR.

446
00:18:01,620 --> 00:18:03,810
And we are arriving at the

447
00:18:03,810 --> 00:18:06,670
metadata set. So, we know all

448
00:18:06,670 --> 00:18:09,210
about metadata. But there are

449
00:18:09,210 --> 00:18:11,360
different types of metadata. So,

450
00:18:11,360 --> 00:18:13,110
we are looking at the metadata

451
00:18:13,110 --> 00:18:14,660
for subject indexing.

452
00:18:16,900 --> 00:18:18,530
So, the vocabulary presented

453
00:18:18,530 --> 00:18:20,080
here can be used as a metadata

454
00:18:20,080 --> 00:18:22,440
set for subject indexing of the

455
00:18:22,440 --> 00:18:24,000
domain knowledge precision

456
00:18:24,000 --> 00:18:25,940
learning. So, that's the idea.

457
00:18:28,340 --> 00:18:30,040
This is the little focus on

458
00:18:30,040 --> 00:18:32,200
truth maintenance systems. It

459
00:18:32,200 --> 00:18:34,050
was already mentioned at the

460
00:18:34,050 --> 00:18:35,060
beginning.

461
00:18:35,860 --> 00:18:36,640
So, it was originally a

462
00:18:36,640 --> 00:18:38,380
symbolic AI mechanism for

463
00:18:38,380 --> 00:18:40,470
consistency. And it tracks

464
00:18:40,470 --> 00:18:42,290
dependencies between beliefs

465
00:18:42,290 --> 00:18:43,140
and facts.

466
00:18:44,420 --> 00:18:46,960
And it revises beliefs when

467
00:18:46,960 --> 00:18:48,220
conflicts arise. And it is

468
00:18:48,220 --> 00:18:50,410
useful for hybridness symbolic

469
00:18:50,410 --> 00:18:52,820
machine learning systems. So,

470
00:18:52,820 --> 00:18:54,620
basically, it starts as a truth

471
00:18:54,620 --> 00:18:55,780
maintenance system.

472
00:18:55,780 --> 00:18:58,050
It can be rooted in the

473
00:18:58,050 --> 00:19:01,420
original symbolic AI, truth

474
00:19:01,420 --> 00:19:03,520
maintenance systems. But it is

475
00:19:03,520 --> 00:19:05,320
useful today in machine

476
00:19:05,320 --> 00:19:06,500
learning.

477
00:19:09,860 --> 00:19:11,950
It can help us to help us to...

478
00:19:11,950 --> 00:19:13,720
So, we cannot do without truth

479
00:19:13,720 --> 00:19:15,380
maintenance systems, so to

480
00:19:15,380 --> 00:19:16,020
speak.

481
00:19:16,660 --> 00:19:18,760
Even in machine learning today.

482
00:19:18,760 --> 00:19:20,980
Because it will enable the

483
00:19:20,980 --> 00:19:24,140
tracking of the dependencies.

484
00:19:24,140 --> 00:19:25,640
Nonetheless, as a concept was

485
00:19:25,640 --> 00:19:26,260
missing.

486
00:19:28,260 --> 00:19:29,710
So, it will support consistencies,

487
00:19:29,710 --> 00:19:32,230
updates, and it will ensure

488
00:19:32,230 --> 00:19:35,460
explainability and everything.

489
00:19:37,460 --> 00:19:39,180
So, finally, the vocabulary. It's

490
00:19:39,180 --> 00:19:42,500
a flat list. Definitions will

491
00:19:42,500 --> 00:19:45,770
be done later. This is just a

492
00:19:45,770 --> 00:19:48,320
list of words. And it's

493
00:19:48,320 --> 00:19:52,020
starting as benchmark for this...

494
00:19:55,460 --> 00:19:56,320
...domain. Definition of the

495
00:19:56,320 --> 00:19:59,860
domain. At the moment, we have...

496
00:19:59,860 --> 00:20:01,520
This can be reached here. It

497
00:20:01,520 --> 00:20:02,580
should be viewable.

498
00:20:03,540 --> 00:20:07,030
But to edit. It's about 100

499
00:20:07,030 --> 00:20:09,240
terms. So, people ask me what

500
00:20:09,240 --> 00:20:11,860
are the inclusion criteria.

501
00:20:12,740 --> 00:20:16,180
But everything that seemed to

502
00:20:16,180 --> 00:20:19,920
be core concept in KRL was

503
00:20:19,920 --> 00:20:21,780
included.

504
00:20:22,420 --> 00:20:24,870
So, looking at a corpus. You

505
00:20:24,870 --> 00:20:26,940
ask me how many papers.

506
00:20:26,940 --> 00:20:29,590
Honestly, I don't remember. I

507
00:20:29,590 --> 00:20:31,780
would have to look it up.

508
00:20:36,820 --> 00:20:40,000
But certainly, there was a page.

509
00:20:40,000 --> 00:20:42,490
A very useful page on GitHub

510
00:20:42,490 --> 00:20:44,760
that hosted a number of key

511
00:20:44,760 --> 00:20:47,780
papers from key conferences.

512
00:20:48,580 --> 00:20:53,060
And this created them all. Painstakingly.

513
00:20:56,420 --> 00:20:58,360
So, it's very important to grow

514
00:20:58,360 --> 00:21:00,480
into the class research impact.

515
00:21:00,480 --> 00:21:02,350
And at the moment, we're just

516
00:21:02,350 --> 00:21:04,050
taking out the terms from the

517
00:21:04,050 --> 00:21:06,180
corpus and compiling a list.

518
00:21:07,380 --> 00:21:08,910
So, how are we saying if this

519
00:21:08,910 --> 00:21:10,500
vocabulary is good or not?

520
00:21:13,700 --> 00:21:16,530
By checking that every... We

521
00:21:16,530 --> 00:21:22,440
pick a few papers. Not randomly,

522
00:21:22,440 --> 00:21:24,500
but based on...

523
00:21:26,020 --> 00:21:28,530
...we're looking at the paper.

524
00:21:28,530 --> 00:21:30,750
So, we're looking at new papers.

525
00:21:30,750 --> 00:21:33,280
And we say, "Is this core

526
00:21:33,280 --> 00:21:35,770
concept in this paper in the

527
00:21:35,770 --> 00:21:38,500
vocabulary or not?" And...

528
00:21:39,620 --> 00:21:42,630
So, we're looking at quaternions,

529
00:21:42,630 --> 00:21:45,020
for example. So, we figured out

530
00:21:45,020 --> 00:21:48,340
that there is a new paper on quaternions.

531
00:21:51,380 --> 00:21:50,790
And I wonder, "Is quaternion in

532
00:21:50,790 --> 00:21:53,560
our vocabulary?" And it wasn't.

533
00:21:53,560 --> 00:21:56,230
It was missing. So, we've added

534
00:21:56,230 --> 00:21:56,660
it.

535
00:21:57,380 --> 00:21:58,220
This is how the current

536
00:21:58,220 --> 00:22:00,550
evaluation is currently done.

537
00:22:00,550 --> 00:22:03,390
So, what is quaternion? Quaternions

538
00:22:03,390 --> 00:22:04,340
is...

539
00:22:04,340 --> 00:22:05,230
They're embeddings for

540
00:22:05,230 --> 00:22:06,380
knowledge, presentation, and

541
00:22:06,380 --> 00:22:07,440
learning. So, they're core

542
00:22:07,440 --> 00:22:08,580
concepts. They should be in the

543
00:22:08,580 --> 00:22:09,300
vocabulary.

544
00:22:10,340 --> 00:22:11,210
And then, you can study the

545
00:22:11,210 --> 00:22:12,340
whole thing.

546
00:22:14,820 --> 00:22:15,220
And look at the example of the

547
00:22:15,220 --> 00:22:19,140
virus models that use quaternions.

548
00:22:19,780 --> 00:22:21,190
They say, "Represent entities

549
00:22:21,190 --> 00:22:22,710
and relations in a hyper

550
00:22:22,710 --> 00:22:24,680
complex space." It's a hyper

551
00:22:24,680 --> 00:22:25,940
complex space.

552
00:22:26,900 --> 00:22:27,850
To model complex relational

553
00:22:27,850 --> 00:22:30,100
patterns. Knowledge graph.

554
00:22:30,100 --> 00:22:31,990
Goodness, mate. You wouldn't

555
00:22:31,990 --> 00:22:33,380
want to miss on that.

556
00:22:33,380 --> 00:22:35,970
So, we look at the papers. And

557
00:22:35,970 --> 00:22:39,150
we say, "Okay, we... So, this

558
00:22:39,150 --> 00:22:41,430
is how the evaluation is done

559
00:22:41,430 --> 00:22:42,900
at the moment.

560
00:22:42,900 --> 00:22:45,600
Finding the papers. Checking

561
00:22:45,600 --> 00:22:47,550
that the core terms and

562
00:22:47,550 --> 00:22:50,240
concepts in the paper is in the

563
00:22:50,240 --> 00:22:51,820
vocab. So, I'm running out of

564
00:22:51,820 --> 00:22:52,420
time.

565
00:22:52,420 --> 00:22:54,100
We're also doing evaluations

566
00:22:54,100 --> 00:22:55,970
with use cases. So, looking at

567
00:22:55,970 --> 00:22:58,890
specific use cases where

568
00:22:58,890 --> 00:22:59,860
knowledge,

569
00:22:59,860 --> 00:23:01,990
representation learning is used

570
00:23:01,990 --> 00:23:04,050
and picking terms from there.

571
00:23:04,050 --> 00:23:07,070
So, from this work, a number of

572
00:23:07,070 --> 00:23:08,180
categories

573
00:23:08,900 --> 00:23:10,570
is emerging. So, we can analyze

574
00:23:10,570 --> 00:23:12,830
and create additional abstract

575
00:23:12,830 --> 00:23:15,090
layers of abstractions from the

576
00:23:15,090 --> 00:23:16,260
vocabulary.

577
00:23:17,780 --> 00:23:21,170
So, for example, so far, we

578
00:23:21,170 --> 00:23:24,600
have identified a number of

579
00:23:24,600 --> 00:23:27,790
categories in KLR. Translation-based,

580
00:23:27,790 --> 00:23:29,390
bilinear, deep neural,

581
00:23:29,390 --> 00:23:31,760
geometric, temporal, which

582
00:23:31,760 --> 00:23:34,500
could be used as further

583
00:23:34,500 --> 00:23:37,460
structure for the vocabulary

584
00:23:37,460 --> 00:23:39,380
and future iteration.

585
00:23:40,340 --> 00:23:42,470
So, so far, we can say that the

586
00:23:42,470 --> 00:23:44,110
vocabulary is very useful

587
00:23:44,110 --> 00:23:46,190
because it just tells us what

588
00:23:46,190 --> 00:23:47,460
KLR consists of.

589
00:23:48,420 --> 00:23:50,060
It starts indexing the topic,

590
00:23:50,060 --> 00:23:53,550
the domain. At the same time,

591
00:23:53,550 --> 00:23:57,850
it's far from being complete.

592
00:23:57,850 --> 00:23:59,070
And it's probably even a little

593
00:23:59,070 --> 00:24:01,160
bit dirty. It's a little bit

594
00:24:01,160 --> 00:24:02,020
noisy.

595
00:24:03,300 --> 00:24:07,680
So, there are terms in there

596
00:24:07,680 --> 00:24:12,550
which may not be purely KLR.

597
00:24:12,550 --> 00:24:14,620
Knowledge presentation or KR.

598
00:24:14,620 --> 00:24:18,210
Could be. We need to define

599
00:24:18,210 --> 00:24:20,730
what we're going to leave them,

600
00:24:20,730 --> 00:24:23,060
but we're going to delete.

601
00:24:24,980 --> 00:24:27,100
Then, of course, this has been

602
00:24:27,100 --> 00:24:29,060
done very coarsely. A little

603
00:24:29,060 --> 00:24:31,770
bit of experimental work. Definitions

604
00:24:31,770 --> 00:24:33,060
still not done.

605
00:24:33,940 --> 00:24:34,990
Further refinement needed and

606
00:24:34,990 --> 00:24:37,240
continuing the evaluation. So,

607
00:24:37,240 --> 00:24:39,050
we're going to continue with

608
00:24:39,050 --> 00:24:40,260
the evaluation.

609
00:24:40,260 --> 00:24:43,060
We're going to expand, refine,

610
00:24:44,820 --> 00:24:46,640
develop unique definitions. We

611
00:24:46,640 --> 00:24:48,930
create abstractions, further

612
00:24:48,930 --> 00:24:50,850
abstractions, further layers of

613
00:24:50,850 --> 00:24:51,780
structures.

614
00:24:52,420 --> 00:24:54,490
We're going to contribute to

615
00:24:54,490 --> 00:24:57,160
standards development, we hope.

616
00:24:57,160 --> 00:24:59,260
And maybe build an agent to do

617
00:24:59,260 --> 00:25:00,260
this work.

618
00:25:00,260 --> 00:25:02,720
So, wouldn't it be nice if we

619
00:25:02,720 --> 00:25:04,990
could, if someone could help us

620
00:25:04,990 --> 00:25:07,180
to do the AI for doing this.

621
00:25:07,180 --> 00:25:09,540
And this is an open call.

622
00:25:10,580 --> 00:25:11,480
Should have some flashing

623
00:25:11,480 --> 00:25:13,220
lights on this line.

624
00:25:14,100 --> 00:25:17,500
We're going to talk about the

625
00:25:17,500 --> 00:25:21,050
AI for the AI. And this is the

626
00:25:21,050 --> 00:25:24,330
most important and dynamic

627
00:25:24,330 --> 00:25:27,300
aspect of this field.

628
00:25:27,300 --> 00:25:29,100
The leading edge. Standardized

629
00:25:29,100 --> 00:25:31,350
vocabulary includes explainability

630
00:25:31,350 --> 00:25:32,740
and human learning.

631
00:25:33,300 --> 00:25:34,660
It is necessary to develop

632
00:25:34,660 --> 00:25:36,740
subject matter metadata.

633
00:25:37,300 --> 00:25:39,300
Bridges gap between symbolic

634
00:25:39,300 --> 00:25:41,710
and statistical AI. Contributes

635
00:25:41,710 --> 00:25:43,860
safe and auditory AI system.

636
00:25:43,860 --> 00:25:45,750
This is our super bottom line.

637
00:25:45,750 --> 00:25:47,300
So, thank you so much.

638
00:25:48,900 --> 00:25:50,470
You can check out the vocab.

639
00:25:50,470 --> 00:25:53,920
You can join by search. Search

640
00:25:53,920 --> 00:25:56,260
for this and join.

641
00:25:56,900 --> 00:25:59,640
And you're very welcome to

642
00:25:59,640 --> 00:26:02,800
shoot some questions here or

643
00:26:02,800 --> 00:26:04,980
wherever you like.

644
00:26:05,460 --> 00:26:09,180
So, get in touch. Thank you.

645
00:26:09,180 --> 00:26:10,100
Bye.