Hello. This is Paola Di Maio
presenting this work in my
capacity of chair of the AIKRCG,
which stands for Underficient
Intelligence Knowledge Presentation
Community Group at W3C, where a
large part of this work is
being done and shared.
And, Jan Chin, thank you for
presenting the paper at the
conference in person in
Barcelona. This is a pre-recorded
talk. And this is me, I'm the
voice. This is my face.
The contents of the paper is
outlined here. The main topic
is Knowledge Representation,
with a focus on Knowledge
Representation Learning and the
development of a vocabulary,
which serves as a metadata set,
as a type of subject index
metadata.
So the background is AI, which
is enveloping everything and
moving very fast. So it's doing
AI is giving us unprecedented
capabilities, you know, fields
with a lot of open issues,
uncertainties and risk factors.
So AI is fundamentally rewriting
history. So today we, we search
for facts, an event is like a
filter for everything that we
know these days. It is also
wiping our individual memory,
like when we're forgetting to
remember things, we are just
now we're searching. This
happened a little bit with the,
with search engines.
with search engines and with
Google, mighty Google, already
25 years ago. But at the
beginning,
Google was indexing fairly
accurately. And now because of
the data explosion, the volumes
of data being produced every
day, much of which is noise,
but systems, search engines,
AI systems cannot distinguish
what is noise from signal. So
we are having a problem that
we are writing, we're asking
questions to AI about reality
and the aspects, and AI is
doing
an excellent job of bringing
things up. But at the same time
is presenting results which
have inherent bias, which is
contributing possibly to
distortions in the serious
concern.
And AI is learning from humans,
ingesting intelligence and
becoming autonomous and
it's building itself. So we
really don't know what AI is
becoming. And this is one of
the factors
that motivated this work in the
first place and say, "We want
to understand what AI is
becoming and
how it's doing things, but how
can we do that?" And we went to
knowledge representation. So
knowledge
representation has been
considered part of AI. The
explicit representation of
facts and rules and
logic within that was leveraged
by AI for reasoning. And in the
age of machine learning, KR,
however,
has become less relevant to the
point that people are saying no
differentiation is not relevant
to
what we're doing today with
neural networks. And this is
this arguing, disagreeing with
the basic argument
was a starting point for this
work seven years ago. And now
we are seeing that people are
figuring out,
after we have hammered and
written a lot of papers and
done quite a lot of talks about
it, and people
are starting looking back at
knowledge representation saying,
"No, we do need knowledge
representation
for a number of things, even in
machine learning."
"Nonelless, since the beginning
of knowledge representation
field development, it has not
been understood or defined
practice
not well." So it has been used
in a small way by selecting a
distinct knowledge precision
technique to achieve specific
results in the construction of
intelligence system for a
number of years.
But as a field, it has been
challenging to define because
it's very vast and it doesn't,
it's not just one thing.
It has been challenging. And
there are papers dated 20, 30
years ago, I don't have the
citation in front of me, but,
which were very clearly already
identifying these challenges at
the time, which was 50 years
ago, 40 years ago, but today,
this challenge is remaining. So
knowledge representation as a
field, still not defined in
practice, it is not defined in
practice, but today, this
challenge is remaining.
representation as a field still
not defined in practice. It is
becoming relevant to machine
learning again. Still do not
know exactly how to define it.
And the work was started with
this in
mind saying we want to be able
to say what is knowledge
representation and how can we
help us
to solve the challenges and the
open issues that machine
learning is facing today. And
we've been
very busy since. The challenge
for me has been to track the
leading edge of where this
everything is
going. And what I'm presenting
to you today is a little bit
state-of-the-art the work as it
stands
today. So we started this
trying to figure out knowledge
representation as a domain. And
what became compelling more
recently is that certain
the mission critical KR
concepts that ensure the
reliability of systems were
completely missing in AI
standards. In particular the
truth preservation, which is a
core KR concept, was noted as
absent at the time of writing
in all of the AI standards. Now
you're going to ask me how did
you figure out the standards?
There is an initiative by the Turing
Institute called AI Standards.
was noted as absent at the time
of writing in all of the AI
standards. Now you're going to
ask me how did you figure out
the standards? There is an
initiative by the Turing
Institute called AI Standards
Hub and it's searchable. So I
don't know, the Turing
Institute has been much
criticized for a number of
things, but praise to them for
doing a searchable hub that
allows to query,
all AI standards by keyword.
And at the time, trust
participation was not in any of
them. So, alarm. And then of
course, there was some double
checking by opening each
standard individually and parsing
it to make sure that the search
engine wasn't just broken or
missing. But, you know, this as
far as as far as I was
confident enough to make this
assertions that
the concept of truth
preservation is absent in AI
standards at the time of
writing, which has been six
points or throughout 25. So the
lack of KR concepts in AI
standards of certain KR
concepts, critical KR concepts,
such as truth preservation, is
considered, can be considered a
risk of AI failure.
And if this is true, then, you
know, these AI standards that
are being developed may not be
fit for purpose unless they are
integrated with core knowledge,
presentation concepts. So there
are a number of risks of AI
without KR. Opaikeness,
transparency, inconsistency,
increased system, which leads
to the risk of AI.
which lead to increased
systemic risk and possibly
systemic aberration, which is
another big topic. If you're
interested, you should be able
to find the talk I recently
gave on the topic.
So, a number of papers and
publications were written
leading up to this work, if you're
interested in the background.
The scope of work presented
here is a map of the knowledge
domain called Artificial
Intelligence Knowledge
Representation, with a focus on
knowledge presentation learning.
The scope of work is to
identify a list domain
vocabulary. It's not in scope
to build a full taxonomy or ontology
at this time, although I'm sure
with the right tools and
resources we can do that.
And it is not in scope to
explain everything about AI, KR
or metadata. So, for those of
you who don't know or don't
have the time to brush it off,
brush it off, knowledge
representation can be
considered as a process or a
method for encoding information
in machine readable format to
enable a machine to learn and
act intelligent.
Not one possible definition.
And it uses diverse methods and
tools to do this. So, in
general, we're talking about
knowledge presentation here. I
need to emphasize that
knowledge presentation is
derived from KR in general. So,
it supports reasoning. It's
vital for explainability. It
helps to decode the hidden
layer. So, this is a new role
for KR. I have a slide to
explain this better later.
Knowledge representation
learning can be defined as a
set of methods to encode
symbolic knowledge into
continuous vector spaces, so
that AI systems can design and
make predictions more
effectively.
Knowledge representation
learning is about encoding
symbolic knowledge in
traditional AI, like rule-based
systems or frame-based systems
or knowledge-based systems, so
to speak.
But here we are translating a
knowledge precision and it can
help to translate the symbolic
knowledge, the rules, the logic
into continuous vector spaces,
into machine learning constructs,
so to speak.
And it is important for a
number of reasons, which I do
not enumerate here, but for me,
the work that I'm doing is, it
connects knowledge presentation
learning,
and the more symbolic knowledge
presentation. So we can see
that knowledge presentation
learning is somewhere in the
middle. That's very interesting.
So we are now mapping the wider
knowledge presentation domain
to see where KLR fits in. And KLR
fits in here. This is what we
are looking at today. This
bigger picture is a definition,
an attempt to define the
knowledge representation domain
as a whole.
So, I say AI, because no
representation is a field
exists also outside AI. It can
be used in a number of other
fields, including legal design.
there is a beautiful map of how
long-representation relates to
a number of fields, which is
not just systems, and not just
computer science.
But here we're talking about KR
for AI, and if we've started
defining it in terms of subdomains
or subcategories,
starting from upper foundation,
the existential level for what
does AIKR consist of?
Okay, so we use, we relate the
knowledge-representation
concepts to top level, as a top
level ontology, using standard
formalisms.
And then here, we are looking
at the number of domains. So we
are saying, whatever AI is
going to do is going to have an
upper level,
a foundational level, or top
level ontology, an existential
level, that defines the highest
abstraction.
And it's going to have a domain,
an application domain. Oops,
there is a double there. And
the reliability of engineering
has come in, because one of the
biggest AI risks
is the lack of reliability, so
that especially generative AI,
which is very smart, is not
replicable. So from a systems
reliability point of view, that
is a problem. So I'm defining
knowledge presentation in terms
of reliability engineering
somewhere else. And today I'm
presenting this very briefly.
this very briefly. So why are
we doing this? It is to provide
an index for communication and
learning
of the domain. And it can
obviously support auditable,
robust applications and it
enables
metadata-driven discovery and
interoperability. So I must say
that the word metadata, which
is the
keyword of interest for this
conference, is here. Saying we're
going to use the vocabulary
as a metadata set for the
subject matter domain knowledge
presentation. So this is the
subject
matter domain. The vocabulary
is going to be used as metadata
and here it's listed as one of
the uses.
And very interesting is going
to be to see how can we
build automated monitoring. So
how can we use it for
evaluations of LLMs.
in this respect. Methods, how
do we do it? We identify subdomains,
topics, pertinent topics.
We identify core resources for
each topic, for each subdomain.
And I'm referring to these,
these bubbles,
subdomains. And then we extract
key terms and concepts for each
resource. We go around and
round and we do a little bit of
a doc.
a bit creatively, so to speak.
And with her, we extract
concepts and terms and we try
to clean them up,
keep the relevant ones and the
duplicate. And then we refine
them via evaluations.
So this is a general method for
constructing core vocabulary.
This is the slide I was
referring to earlier, where
traditionally knowledge
presentation is used to encode
logic and semantics in old-fashioned
AI.
But in machine learning today,
we can use it to decode hidden
layers. That's, I think, the
most interesting aspect of the
relevance of KR's machine
learning today.
And knowledge presentation
learning sits somewhere here,
together with the neuro-symbolic
AI. Another big topic. So,
somewhere in the knowledge
organization, the spectrum of
knowledge organization systems,
this work
stands here, but is the basis
for whatever development and
more structure,
higher order development that
we're going to follow.
So, we started with symbolic
logic from the old-fashioned AR.
And we are arriving at the
metadata set. So, we know all
about metadata. But there are
different types of metadata. So,
we are looking at the metadata
for subject indexing.
So, the vocabulary presented
here can be used as a metadata
set for subject indexing of the
domain knowledge precision
learning. So, that's the idea.
This is the little focus on
truth maintenance systems. It
was already mentioned at the
beginning.
So, it was originally a
symbolic AI mechanism for
consistency. And it tracks
dependencies between beliefs
and facts.
And it revises beliefs when
conflicts arise. And it is
useful for hybridness symbolic
machine learning systems. So,
basically, it starts as a truth
maintenance system.
It can be rooted in the
original symbolic AI, truth
maintenance systems. But it is
useful today in machine
learning.
It can help us to help us to...
So, we cannot do without truth
maintenance systems, so to
speak.
Even in machine learning today.
Because it will enable the
tracking of the dependencies.
Nonetheless, as a concept was
missing.
So, it will support consistencies,
updates, and it will ensure
explainability and everything.
So, finally, the vocabulary. It's
a flat list. Definitions will
be done later. This is just a
list of words. And it's
starting as benchmark for this...
...domain. Definition of the
domain. At the moment, we have...
This can be reached here. It
should be viewable.
But to edit. It's about 100
terms. So, people ask me what
are the inclusion criteria.
But everything that seemed to
be core concept in KRL was
included.
So, looking at a corpus. You
ask me how many papers.
Honestly, I don't remember. I
would have to look it up.
But certainly, there was a page.
A very useful page on GitHub
that hosted a number of key
papers from key conferences.
And this created them all. Painstakingly.
So, it's very important to grow
into the class research impact.
And at the moment, we're just
taking out the terms from the
corpus and compiling a list.
So, how are we saying if this
vocabulary is good or not?
By checking that every... We
pick a few papers. Not randomly,
but based on...
...we're looking at the paper.
So, we're looking at new papers.
And we say, "Is this core
concept in this paper in the
vocabulary or not?" And...
So, we're looking at quaternions,
for example. So, we figured out
that there is a new paper on quaternions.
And I wonder, "Is quaternion in
our vocabulary?" And it wasn't.
It was missing. So, we've added
it.
This is how the current
evaluation is currently done.
So, what is quaternion? Quaternions
is...
They're embeddings for
knowledge, presentation, and
learning. So, they're core
concepts. They should be in the
vocabulary.
And then, you can study the
whole thing.
And look at the example of the
virus models that use quaternions.
They say, "Represent entities
and relations in a hyper
complex space." It's a hyper
complex space.
To model complex relational
patterns. Knowledge graph.
Goodness, mate. You wouldn't
want to miss on that.
So, we look at the papers. And
we say, "Okay, we... So, this
is how the evaluation is done
at the moment.
Finding the papers. Checking
that the core terms and
concepts in the paper is in the
vocab. So, I'm running out of
time.
We're also doing evaluations
with use cases. So, looking at
specific use cases where
knowledge,
representation learning is used
and picking terms from there.
So, from this work, a number of
categories
is emerging. So, we can analyze
and create additional abstract
layers of abstractions from the
vocabulary.
So, for example, so far, we
have identified a number of
categories in KLR. Translation-based,
bilinear, deep neural,
geometric, temporal, which
could be used as further
structure for the vocabulary
and future iteration.
So, so far, we can say that the
vocabulary is very useful
because it just tells us what
KLR consists of.
It starts indexing the topic,
the domain. At the same time,
it's far from being complete.
And it's probably even a little
bit dirty. It's a little bit
noisy.
So, there are terms in there
which may not be purely KLR.
Knowledge presentation or KR.
Could be. We need to define
what we're going to leave them,
but we're going to delete.
Then, of course, this has been
done very coarsely. A little
bit of experimental work. Definitions
still not done.
Further refinement needed and
continuing the evaluation. So,
we're going to continue with
the evaluation.
We're going to expand, refine,
develop unique definitions. We
create abstractions, further
abstractions, further layers of
structures.
We're going to contribute to
standards development, we hope.
And maybe build an agent to do
this work.
So, wouldn't it be nice if we
could, if someone could help us
to do the AI for doing this.
And this is an open call.
Should have some flashing
lights on this line.
We're going to talk about the
AI for the AI. And this is the
most important and dynamic
aspect of this field.
The leading edge. Standardized
vocabulary includes explainability
and human learning.
It is necessary to develop
subject matter metadata.
Bridges gap between symbolic
and statistical AI. Contributes
safe and auditory AI system.
This is our super bottom line.
So, thank you so much.
You can check out the vocab.
You can join by search. Search
for this and join.
And you're very welcome to
shoot some questions here or
wherever you like.
So, get in touch. Thank you.
Bye.