Action xx: What is required in core to support RDF from Dave Reynolds on 2007-07-05 (public-rif-wg@w3.org from July 2007)

From: Dave Reynolds <der@hplb.hpl.hp.com>
Date: Thu, 05 Jul 2007 18:41:32 +0100
To: RIF <public-rif-wg@w3.org>
Message-ID: <468D2D4C.2000206@hplb.hpl.hp.com>
This is to satisfy my, as yet unnumbered action, from this week's telecon.

** Background

Jos has proposed a translation from RDF data to RIF Frames and 
associated means of handling RDF and RDFS semantics [1]. Whilst there 
are a few quibbles over minor details (see subsequent email threads), 
and it will need updating once the revised Core is stable, this seems 
like the right way to go.

Christian asked whether this has any implication for Core, specifically 
is there anything that implementers will need to do as a result of RDF 
compatibility.

** Summary

With the proposed approach there is no need to modify the RIF Core 
syntax or semantics to support RDF/RDFS. However, the full translation 
would ideally require a small number of simple datatypes and associated 
operators.

The rest of this note describes each of these but stops short of draft 
text. Given an "in principle" agreement on these, I'd be prepared to 
take a future action to draft some text.

Topics:
   Datatypes
     - XMLLiteral
     - text literal
     - rigid bNode
   Builtins
   Unrestricted frames
   Datasets


** Datatypes

RDF essentially has four[2] node types:
    - resources identified by RDF URI References
    - typed literal nodes
    - plain literals with optional language tags
    - blank nodes

Since non-literal RIF constants can be denoted by IRIs I believe the 
first case is covered with no modification to Core. This should be 
reviewed once the modified Core is stable.

Our range of literal types covers the main xsd types used in RDF in an 
RDF compatible way so no change is needed there.

However, RDF defines one extra literal type rdf:XMLLiteral whose lexical 
form is any is well-balanced canonical XML as defined in [3]. Jos' 
translator calls out the need for a sort/datatype for this.

Mod #1: extend the set of datatypes in Core to include rdf:XMLLiteral. 
This would have no additional semantics and need have no associated 
builtin operations. From the point of view of a RIF translator only the 
lexical form (a string) need be handled. There is no requirement to 
generate or manipulate a DOM tree, for example. Though a RIF parser 
should validate the legal lexical form of the XMLLiteral.

Of course, we could decide that XMLLiterals would be a useful thing in 
RIF, quite apart from RDF, one might want rules that explicitly process 
XML document fragments and support XPath or other operations on the XML 
fragments. I am NOT proposing this for Core in phase 1.

As well as typed literals RDF supports the notion of plain literals. 
Most plain literals are just strings and are defined to be semantically 
equivalent to typed literals of type xsd:string. However, it is also 
possible to associate a language tag with a plain literal, for example 
to support multi-lingual labels. Core currently does not have a datatype 
capable of representing such labels.

Mod #2: extend the set of datatypes in Core to include rif:text, a 
datatype whose value space is the set of pairs (text, langtag) where 
'text' is any Unicode string and 'langtag' is as defined in RFC-3066. 
The lexical form for this datatype could be "text"@langtag.

Jos' translation proposal also introduces the sort rdfs:Literal for 
plain literals. My counter proposal on this was that plain literals 
without language tags should be mapped to xsd:string [4]. I currently 
believe the latter is sufficient and there is no requirement for a plain 
literal datatype.

Finally we come to bNodes. The translation proposal translates bNodes in 
input data (and in rule conclusions) into generated symbols, so called 
"rigid bNodes". In the translation proposal these generated symbols are 
iri's with a particular lexical form to be defined. In order to avoid 
having to reserve parts of IRI space and to facilitate useful builtins I 
would prefer to have one more datatype specifically to represent such 
rigid bNodes.

Mod #3: extend the set of datatypes in Core to include rif:bNode, a 
datatype whose value space is unbounded (just as rif:iri) and whose 
lexical form is an arbitrary string "id"^^rif:bNode.

** Builtins

As already mentioned [5]. In order to be able to express rules over RDF 
it would be useful to also have builtins which can recognize RDF 
constructs and for those constructs to be compatible with SPARQL.

Specifically the proposed builtins are SPARQLs operators:
     isIRI
     isBlank
     isLiteral
     lang
     datatype
     langMatches

and constructors for the new datatypes:
     text(Text, Lang)
     bNode(A, ... X)  - gensym a new value of type rif:bNode

** Unrestricted frames

In order to express rules over RDF we very commonly need to quantify 
over subjects and often over predicates. This means that in the RIF 
frame machinery:
     s[p->o]
variables should be permitted in all of s, p and o positions.

That is currently supported by the Core and requires no modification.

However, I heard Gary at least say this may be a problem for him.  So 
there is the possibility that the WG might want to restrict the syntax 
of frames to preclude this. If it did so it would have to either give up 
on useful RDF processing or the RDF translation would have to be 
redesigned, for example, to map to a rif:triple(s,p,o) predicate.

** Datasets

In [6] Jos proposes being able to us RDF(S) graphs as background 
knowledge for RIF rules sets. So that a rule set would be able to 
indicate that one or more of its datasets followed the RDFDataModel.

This would imply that such a dataset had been derived from an RDF source 
  and converted to RIF facts via the tr algorithm.

We *could* say that a RIF compliant processor should be able to perform 
this derivation process itself - parse an RDF/XML document and tr it. My 
understanding is that we are not*going to do that. That the only 
document format a Core-compliant processor is required to understand is 
RIF documents and the RIF specification will be neutral on whether it is 
the RIF processor, the data source or a third party translation service 
which converts any RDF data to RIF facts using the tr algorithm.

Dave

[1] http://lists.w3.org/Archives/Public/public-rif-wg/2007May/0077.html

[2] I'm skipping over one subtlety here. In RDF, resources denoted by 
"lll"^^ddd where the string lll not a legal lexical form of datatype ddd 
is an "unknown" resource which is outside the set of literals, not a 
syntax error. Jos' translation handles this by translating ill-formed 
literals to artificial IRIs.

[3] http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-XMLLiteral

[4] Jos pointed out another subtlety here that the lexical spaces are 
slightly different. Xsd:string excludes unicode sequences like #fffe 
which aren't allowed in XML, whereas the legal lexical space of plain 
RDF literals is arbitrary unicode sequences. However, since the 
translator will normally be translating from an (RDF/)XML source then in 
fact only valid xsd:strings will be expressible in any case. So limiting 
the lexical space of RDF plain literals that can be processed by RIF to 
be just those expressible as xsd:strings seems entirely acceptable to me 
and imposes no practical restriction.

[5] http://lists.w3.org/Archives/Public/public-rif-wg/2007Jun/0050.html

[6] http://www.w3.org/2005/rules/wg/wiki/Arch/RDF

-- 
Hewlett-Packard Limited
Registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England
Received on Thursday, 5 July 2007 17:41:55 UTC