[UCR] Expansion on: RIF Core must accept RDF triples as data

At the last telecon I took an (unminuted) action to expand on the
design constraint:
http://www.w3.org/2005/rules/wg/wiki/The_RIF_Core_must_be_able_to_accept_RDF_triples_as_data

I'm starting this off by email rather than editing the wiki because
(a) I'm not convinced what was asked for should be part of the design
constraint (see separate msg), (b) I prefer to use email for
discussion and then reflect the results on the wiki and (c) that page
was created by Peter and I'm not comfortable just wading it and
filling it up with my own ramblings.

Questions to be addressed:
  - does this mean all RIF processors need an RDF/XML parser?
  - does this imply all RIF processors have to build in RDF semantics?
  - how does this relate to the use cases?

The core implication of this requirement on RIF is that it should be
possible to express patterns in rule bodies which match to RDF
triples. There are multiple ways this minimal requirement could be
satisfied which would have different implications for the above
questions.  Amongst the obvious ones are:

(i) All (or some) RIF binary predicates could be interpreted as RDF
properties and be satisfied if matching triples are present in the
scope of the rule execution. This implies that RIF predicates are
identified by URIs; that the space of RIF literals includes RDF plain
literals (i.e. unicode strings with optional xml:lang tags) and RDF
typed literals; and that RIF support a second order syntax so that
quantification over the RDF property symbol is possible.

(ii) There could be a distinguished RIF predicate rdf(s,p,o) for
matching over RDF triples [and/or a quad version rdf(s,p,o,g) for
testing presence of the spo triple in a graph identified by the URI
g]. This variant leads to less syntactically appealling RDF patterns
but avoids the need for second order syntax.

(iii) There could be a distinguished RIF predicate sparql(q, vi, g)
which issues a SPARQL query q, with distinguished variables vi,
against a graph identified by some URI g.

** Does this mean all RIF processors need an RDF/XML parser?

No, at least not necessarily.

At this stage it's not clear how we are actually going to define a RIF
processor and what the conformance criteria (if any [2]) will be. For
example, the question of how the data to be matched is made available
to the RIF processor might be left out of the RIF Core specification
altogether. That would make it possible to have RIF Core processors
which have some out of band mechanism for making the RDF triples
available. For example the RIF processor might operate over a RDBMS
and there might be some completely separate RDF to relational mapping
processor involved in the overall tool chain.

Alternatively, option (iii) would enable a RIF rule set to query any
web source which supports the SPARQL protocol and a minimally
conforming RIF processor would only need to interpret the SPARQL query
results XML format [1].

Conversely we might decide that all RIF implementations should have to
ability to directly query an arbitrary RDF document on the web
(e.g. the rdf/4 predicate in option (ii) or CWM's log:includes
predicate). We might make support of that part of the minimal
conformance criteria in which case, yes, all RIF processor would need
an RDF/XML parser.

My personal preference is probably the latter; but so long as RIF can
express any RDF triple pattern in rule bodies then the question of how
any RDF data gets into the scope of the rule processor could be left
outside of the specs without necessarily failing this requirement.

** Does this imply all RIF processors have to build in RDF semantics?

No, not necessarily.

First, it is not strictly necessary since it is (more or less)
possible to express the RDF semantics as a rule set [3], given a
sufficiently expressive rule language.

Second, there are several different levels of RDF semantics and
different applications require different levels. There's simple
entailment, RDF entailment, RDFS entailment, RDF datatype entailment
and extensional RDFS entailment. Furthermore, in practical
applications it is often desirable to omit some or all of the
axiomatic triples from RDFS entailment, especially the infinite set of
rdf:_n triples [4].

Third, if the complete RDFS semantics were always built-in then it
would not be possible to use RIF to publish proof theoretic semantics
for RDF-based vocabularies [5] other than those which include all of
RDFS (including the ugly container membership axioms). This might cause 
problems for processors which want to handle OWL/DL and RIF since OWL/DL 
is not a superset of RDFS.

My personal preference is that RIF rule sets should be able to include
metadata declaring what level of built-in RDF/RDFS semantics they
assume. A RIF processor would be free to meet the requirement by using
an appropriate (RIF) ruleset defining the required level or to
implement the semantics natively. However, this is not strictly 
necessary in order to satisfy this requirement.

** How does this relate to the use cases?

A number of submitted use cases explicitly mention the need to express
rules which match against RDF data. These include:

http://www.w3.org/2005/rules/wg/wiki/Internet_search%3A_combining_query_language%2C_rule_languages_and_scoped_negation
http://www.w3.org/2005/rules/wg/wiki/Rule-Based_Combined_Access_to_XML_and_RDF_Data
http://www.w3.org/2005/rules/wg/wiki/Managing_incomplete_information
http://www.w3.org/2005/rules/wg/wiki/RIF_RuleML_FOAF
http://www.w3.org/2005/rules/wg/wiki/Message_Transformation
http://www.w3.org/2005/rules/wg/wiki/Information_Integration
http://www.w3.org/2005/rules/wg/wiki/Publication_of_semantics_%28e.g._SKOS%2C_RDFS%29
http://www.w3.org/2005/rules/wg/wiki/Rich_Knowledge_Representation
http://www.w3.org/2005/rules/wg/wiki/Filling_the_holes_of_OWL

At least the first two of these include examples which test for the
presence of a triple in a document at a given URL, implementing those
would require an RDF/XML parser (or the source URL's might be
restricted to SPARQL servers). The concrete examples in those first
two are also instances where the triples are treated as simple facts
and don't look like they would require the RIF processor to implement
the full RDFS entailment semantics.

In the Message Transformation use case a separate RDF processor will
already have converted the message syntax to RDF triples so that the
RIF processor needs API to access that parsed data, not a parser of
its own.

In the last two, rich KR, use cases one imagines they will assume RDF
semantics has been implemented. Since they build on OWL they will
assume the optional extensional RDFS semantics but might have to omit 
any RDFS axioms which would push them into OWL/full.


I hope this at least partly addresses the questions raised last week. 
I'm happy to put some or all of this into the Wiki once any dust has 
settled.

Dave


[1] http://www.w3.org/TR/rdf-sparql-XMLres/

[2] It is perfectly possible to have a useful spec for RIF which defines
a syntax and semantics for the rule language(s) without it including a
notion of what it means to be a conformant RIF processor (c.f. the RDF
specs). So questions of the form "does that mean a conformant RIF
processor MUST ..." are not necessarily meaningful.

[3] http://www.w3.org/TR/rdf-mt/#rules

[4] In Jena we arrived at the compromise that the default RDFS rules do
not include the rdf:_n axioms at all. A user who really wants to
work with RDF containers, despite our cautions, can turn those rules on 
but they get the solipsistic view that only the rdf:_i explicitly 
mentioned in the source data have corresponding axiomatic triples, thus 
keeping the RDFS closure of a finite triple set finite.

[5]
http://www.w3.org/2005/rules/wg/wiki/Publication_of_semantics_%28e.g._SKOS%2C_RDFS%29

Received on Sunday, 23 April 2006 17:15:59 UTC