- From: Dave Reynolds <der@hplb.hpl.hp.com>
- Date: Thu, 05 Jul 2007 18:41:32 +0100
- To: RIF <public-rif-wg@w3.org>
This is to satisfy my, as yet unnumbered action, from this week's telecon.
** Background
Jos has proposed a translation from RDF data to RIF Frames and
associated means of handling RDF and RDFS semantics [1]. Whilst there
are a few quibbles over minor details (see subsequent email threads),
and it will need updating once the revised Core is stable, this seems
like the right way to go.
Christian asked whether this has any implication for Core, specifically
is there anything that implementers will need to do as a result of RDF
compatibility.
** Summary
With the proposed approach there is no need to modify the RIF Core
syntax or semantics to support RDF/RDFS. However, the full translation
would ideally require a small number of simple datatypes and associated
operators.
The rest of this note describes each of these but stops short of draft
text. Given an "in principle" agreement on these, I'd be prepared to
take a future action to draft some text.
Topics:
Datatypes
- XMLLiteral
- text literal
- rigid bNode
Builtins
Unrestricted frames
Datasets
** Datatypes
RDF essentially has four[2] node types:
- resources identified by RDF URI References
- typed literal nodes
- plain literals with optional language tags
- blank nodes
Since non-literal RIF constants can be denoted by IRIs I believe the
first case is covered with no modification to Core. This should be
reviewed once the modified Core is stable.
Our range of literal types covers the main xsd types used in RDF in an
RDF compatible way so no change is needed there.
However, RDF defines one extra literal type rdf:XMLLiteral whose lexical
form is any is well-balanced canonical XML as defined in [3]. Jos'
translator calls out the need for a sort/datatype for this.
Mod #1: extend the set of datatypes in Core to include rdf:XMLLiteral.
This would have no additional semantics and need have no associated
builtin operations. From the point of view of a RIF translator only the
lexical form (a string) need be handled. There is no requirement to
generate or manipulate a DOM tree, for example. Though a RIF parser
should validate the legal lexical form of the XMLLiteral.
Of course, we could decide that XMLLiterals would be a useful thing in
RIF, quite apart from RDF, one might want rules that explicitly process
XML document fragments and support XPath or other operations on the XML
fragments. I am NOT proposing this for Core in phase 1.
As well as typed literals RDF supports the notion of plain literals.
Most plain literals are just strings and are defined to be semantically
equivalent to typed literals of type xsd:string. However, it is also
possible to associate a language tag with a plain literal, for example
to support multi-lingual labels. Core currently does not have a datatype
capable of representing such labels.
Mod #2: extend the set of datatypes in Core to include rif:text, a
datatype whose value space is the set of pairs (text, langtag) where
'text' is any Unicode string and 'langtag' is as defined in RFC-3066.
The lexical form for this datatype could be "text"@langtag.
Jos' translation proposal also introduces the sort rdfs:Literal for
plain literals. My counter proposal on this was that plain literals
without language tags should be mapped to xsd:string [4]. I currently
believe the latter is sufficient and there is no requirement for a plain
literal datatype.
Finally we come to bNodes. The translation proposal translates bNodes in
input data (and in rule conclusions) into generated symbols, so called
"rigid bNodes". In the translation proposal these generated symbols are
iri's with a particular lexical form to be defined. In order to avoid
having to reserve parts of IRI space and to facilitate useful builtins I
would prefer to have one more datatype specifically to represent such
rigid bNodes.
Mod #3: extend the set of datatypes in Core to include rif:bNode, a
datatype whose value space is unbounded (just as rif:iri) and whose
lexical form is an arbitrary string "id"^^rif:bNode.
** Builtins
As already mentioned [5]. In order to be able to express rules over RDF
it would be useful to also have builtins which can recognize RDF
constructs and for those constructs to be compatible with SPARQL.
Specifically the proposed builtins are SPARQLs operators:
isIRI
isBlank
isLiteral
lang
datatype
langMatches
and constructors for the new datatypes:
text(Text, Lang)
bNode(A, ... X) - gensym a new value of type rif:bNode
** Unrestricted frames
In order to express rules over RDF we very commonly need to quantify
over subjects and often over predicates. This means that in the RIF
frame machinery:
s[p->o]
variables should be permitted in all of s, p and o positions.
That is currently supported by the Core and requires no modification.
However, I heard Gary at least say this may be a problem for him. So
there is the possibility that the WG might want to restrict the syntax
of frames to preclude this. If it did so it would have to either give up
on useful RDF processing or the RDF translation would have to be
redesigned, for example, to map to a rif:triple(s,p,o) predicate.
** Datasets
In [6] Jos proposes being able to us RDF(S) graphs as background
knowledge for RIF rules sets. So that a rule set would be able to
indicate that one or more of its datasets followed the RDFDataModel.
This would imply that such a dataset had been derived from an RDF source
and converted to RIF facts via the tr algorithm.
We *could* say that a RIF compliant processor should be able to perform
this derivation process itself - parse an RDF/XML document and tr it. My
understanding is that we are not*going to do that. That the only
document format a Core-compliant processor is required to understand is
RIF documents and the RIF specification will be neutral on whether it is
the RIF processor, the data source or a third party translation service
which converts any RDF data to RIF facts using the tr algorithm.
Dave
[1] http://lists.w3.org/Archives/Public/public-rif-wg/2007May/0077.html
[2] I'm skipping over one subtlety here. In RDF, resources denoted by
"lll"^^ddd where the string lll not a legal lexical form of datatype ddd
is an "unknown" resource which is outside the set of literals, not a
syntax error. Jos' translation handles this by translating ill-formed
literals to artificial IRIs.
[3] http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-XMLLiteral
[4] Jos pointed out another subtlety here that the lexical spaces are
slightly different. Xsd:string excludes unicode sequences like #fffe
which aren't allowed in XML, whereas the legal lexical space of plain
RDF literals is arbitrary unicode sequences. However, since the
translator will normally be translating from an (RDF/)XML source then in
fact only valid xsd:strings will be expressible in any case. So limiting
the lexical space of RDF plain literals that can be processed by RIF to
be just those expressible as xsd:strings seems entirely acceptable to me
and imposes no practical restriction.
[5] http://lists.w3.org/Archives/Public/public-rif-wg/2007Jun/0050.html
[6] http://www.w3.org/2005/rules/wg/wiki/Arch/RDF
--
Hewlett-Packard Limited
Registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England
Received on Thursday, 5 July 2007 17:41:55 UTC