[Use Case] JSI: Automatically generated rules

I tried to follow Chris' suggestion for a more general use case
which covers a class of systems.


Motivation: Creation of rules and ontologies is a demanding knowledge
engineering task. Instead of and in addition to manual creation, one
should take advantage of large amounts of unstructured and partially
structured data already available on the Web. Data mining and machine
learning techniques can be used to extract semantic annotations in the
form of relations, ontologies and rules.

Representation: Unstructured data (text, pictures, video) is typically
represented by sparse vectors in a high-dimensional vector space.
Learning tasks range from clustering and classification to construction
of semantic graphs and relational rules. Accordingly, results of learning
require representations from linear functions over word weights in
the vector space to logical expressions over attribute/values, and
rules with relations and variables.

Reasoning: Whatever the results of learning, it must be possible to
apply and use them to draw conclusions (eg, for classification, prediction).
Further, during learning, some background knowledge (expressed by rules)
can be used to enhance the results of learning. For example, Allen's
interval algebra can be used to express temporal relations between
days, weeks, months, and then applied during learning to increase the
expressivness of the learned prediction rules.

Explicit representation: Rules have to be shared, re-used, inspected,
adapted and modified. This is the main motivation for RIF in contrast
to semantic web services. Further, rules should be interchangeble not
only between machines, but must be readable for humans as well. This
was the main lesson learned from the success of rule-based systems.

Application scenarios:
- User profiling: monitoring and analysis of user profiles (during
online shopping) to learn association rules.
- Virtual organizations: structuring of organizations' competencies
(from web sources of a large pool of potential partners) to facilitate
dynamic creation of alliances for specific business oportunities.
- Online learning from stream data (eg. business news) to predict
possible future events. Prediction rules change over time gradually,
must be inspected, manipulated and adapted.
- Cross media: different media (text, pictures, video) about the same
source can be aligned and used for annotation. The same sparse vector
representation is used, just over different primitives (words, textures,
visual patterns).
- Multilingual document translation: multilingual documents are aligned
and language independent representation learned.
- Data compression: from large data sets rules can be learned which
preserve some data properties, eg. the ability to discriminate between
one attribute values based on other attributes. They are not equivalent
to the original data, but require considerably less space.

Received on Friday, 9 December 2005 15:33:01 UTC