Implementing a syntax checker and/or OWL Tidy

For the record, here's a sketch of how to implement
an OWL Syntax Checker and/or OWL Tidy with the
graph as triples approach [1].

OWL Tidy takes an input RDF Graph and a level either Lite or DL
and outputs a related graph that is of that level.



Initialize
==========

Set the triple tables to be the OWL DL triple tables.
Set onFailure := Full

Assign to each node in the graph a set of possible
categories. Initially this set is the set of all
categories, except for the builtins, which
either map to their built-in category, or none
for built-ins like rdf:type that must match explicitly.
In these cases treat the builtin uriref itself as a category.

Set the iteration graph to be the input graph.
Set the continuation subgraph to be empty.
Set changed to false.


Iterate
=======
Iterate over the triples in the iteration graph;
for a more efficient classification, on the first cycle,
start with triples with predicate rdf:type.


For each triple tr, find the current possible
categories of subject (S), predicate (P) and object (O).
Look these up in the triple tables.

Reduction
=========
Set
  S' = { s | s in S, p in P, o in O,
             <s,p,o> in triple table }


Set
  P' = { p | s in S, p in P, o in O,
             <s,p,o> in triple table }


Set
  O' = { o | s in S, p in P, o in O,
             <s,p,o> in triple table }

If S', P' or O' is empty then:
// Failure
  if onFailure==Full
     then emit OWL Full
  else
     set onFailure := Full
     goto TypeChecks
  end if
end if

OWL Tidy may have some unsafe heuristics at this point to drop
or replace some of the triples involving s p and o.

If S != S' or P != P' or O != O' set changed = true.

Set the possible categories for the subject, predicate and object
of the triple to S', P' and O' respectively.

Continuation
============
If there exists s' in S', p' in P', o' in O' such that s' p' o'
is not in the triple tables then add tr to the continuation set.

End-iteration
=============
If changed = true then
  set iteration graph := continuation graph
  set continuation graph := empty
  set changed := false
  goto Iterate
else if onFailure == Full
  goto LiteChecks
else
  goto TypeChecks

LiteChecks
==========

set triple tables to be the lite tables
set onFailure := DL
set changed := false
set iteration graph := input graph
set continuation graph := empty
goto Iterate

Types
=====
If there are any non-builtin non-literal nodes that do not have an
explicit type triple
either return OWL Full (OWL Syntax Checker)
or
add additional explicit type triples, potentially requiring
further iteration(s) for OWL Tidy (see notes below).

Bnodes
======

For each blank node check the additional constraints are satisfied.
The constraint that unnamed individuals are not the object of
two or more triples is easy; the no cycles constraint can be addressed
like the occurs check in Prolog, or using a bottom up procedure by marking
all bnodes which are not the subject of a triple with an unmarked bnode as
object, which do not have owl:disjointWith or owl:equivalentClass as
predicate.
If this check fails, OWL Tidy may choose to name some bnodes with gensym
urirefs (unsound) or may divide a bnode into two bnodes (incomplete) such
that
every triple involving the original is replaced by a triple involving
one of its replacements.

If a bnode fails some of the additional structural checks then an OWL Syntax
Checker returns OWL Full, whereas OWL Tidy may use some specific heurisitics
see notes below.


Finish
======
An OWL Syntax checker has now confirmed that the input conforms to
the triple tables. OWL Tidy has done the same but with modifications
to the input.

An OWL Syntax checker can emit LIte
if onFailure = DL else it emits DL.

++++++++++++

Notes
=====
Syntax Checker
It is possible with appropriate record keeping to use both the OWL Lite
and OWL DL tables in the same run of the algorithm. A graph is then in
OWL DL if it needed to use any of the OWL DL catgeorizations or OWL DL
triples from the triple table.
The LiteChecks step is then redundant.

Tidy
A variety of heuristics can be used.
e.g.
  replacing certain DL syntactic structures with others that are in Lite,
and mean the same, may be desirable e.g.

restriction owl:equivalentClass classID .

can be replaced with

classID owl:equivalentClass restriction .

Ill formed description and restriction nodes with multiple parts can be
divided into multiple well-formed blank nodes, one for each part,
with owl:equivalentClass relating them.

Arbitrary blank nodes may be unsoundly given an ID.

Untyped nodes may be given an explicit type; sometimes this can be
done in a sound way, sometimes in an unambiguous but unsound fashion,
and sometimes in an ambiguous and unsound fashion.

e.g.

individualPropertyID rdfs:range owl:Thing .

can be made OWL Lite by soundly adding

individualPropertyID rdf:type owl:ObjectProperty .


<a> rdf:type owl:Thing .
<a> <b> "fffo".
<c> rdfs:subPropertyOf <b> .

can be made OWL Lite by adding

<b> rdf:type owl:DatatypeProperty .
<c> rdf:type owl:DatatypeProperty .

which is unambiguous but unsound.

And

<a> rdf:type owl:Thing .
<a> <b> "fffo".

can be made OWL Lite by adding either

<b> rdf:type owl:DatatypeProperty .

or


<b> rdf:type owl:AnnotationProperty .

Both of these are unsound.





Jeremy

[1]
http://www.w3.org/2001/sw/WebOnt/syntaxTF/prolog/out/intro.html

Received on Monday, 10 March 2003 06:49:17 UTC