- From: Igor Mozetic <igor.mozetic@ijs.si>
- Date: Fri, 09 Dec 2005 16:32:09 +0100
- To: public-rif-wg@w3.org
I tried to follow Chris' suggestion for a more general use case which covers a class of systems. Motivation: Creation of rules and ontologies is a demanding knowledge engineering task. Instead of and in addition to manual creation, one should take advantage of large amounts of unstructured and partially structured data already available on the Web. Data mining and machine learning techniques can be used to extract semantic annotations in the form of relations, ontologies and rules. Representation: Unstructured data (text, pictures, video) is typically represented by sparse vectors in a high-dimensional vector space. Learning tasks range from clustering and classification to construction of semantic graphs and relational rules. Accordingly, results of learning require representations from linear functions over word weights in the vector space to logical expressions over attribute/values, and rules with relations and variables. Reasoning: Whatever the results of learning, it must be possible to apply and use them to draw conclusions (eg, for classification, prediction). Further, during learning, some background knowledge (expressed by rules) can be used to enhance the results of learning. For example, Allen's interval algebra can be used to express temporal relations between days, weeks, months, and then applied during learning to increase the expressivness of the learned prediction rules. Explicit representation: Rules have to be shared, re-used, inspected, adapted and modified. This is the main motivation for RIF in contrast to semantic web services. Further, rules should be interchangeble not only between machines, but must be readable for humans as well. This was the main lesson learned from the success of rule-based systems. Application scenarios: - User profiling: monitoring and analysis of user profiles (during online shopping) to learn association rules. - Virtual organizations: structuring of organizations' competencies (from web sources of a large pool of potential partners) to facilitate dynamic creation of alliances for specific business oportunities. - Online learning from stream data (eg. business news) to predict possible future events. Prediction rules change over time gradually, must be inspected, manipulated and adapted. - Cross media: different media (text, pictures, video) about the same source can be aligned and used for annotation. The same sparse vector representation is used, just over different primitives (words, textures, visual patterns). - Multilingual document translation: multilingual documents are aligned and language independent representation learned. - Data compression: from large data sets rules can be learned which preserve some data properties, eg. the ability to discriminate between one attribute values based on other attributes. They are not equivalent to the original data, but require considerably less space.
Received on Friday, 9 December 2005 15:33:01 UTC