- From: Dave Reynolds <der@hplb.hpl.hp.com>
- Date: Thu, 05 Jul 2007 18:41:32 +0100
- To: RIF <public-rif-wg@w3.org>
This is to satisfy my, as yet unnumbered action, from this week's telecon. ** Background Jos has proposed a translation from RDF data to RIF Frames and associated means of handling RDF and RDFS semantics [1]. Whilst there are a few quibbles over minor details (see subsequent email threads), and it will need updating once the revised Core is stable, this seems like the right way to go. Christian asked whether this has any implication for Core, specifically is there anything that implementers will need to do as a result of RDF compatibility. ** Summary With the proposed approach there is no need to modify the RIF Core syntax or semantics to support RDF/RDFS. However, the full translation would ideally require a small number of simple datatypes and associated operators. The rest of this note describes each of these but stops short of draft text. Given an "in principle" agreement on these, I'd be prepared to take a future action to draft some text. Topics: Datatypes - XMLLiteral - text literal - rigid bNode Builtins Unrestricted frames Datasets ** Datatypes RDF essentially has four[2] node types: - resources identified by RDF URI References - typed literal nodes - plain literals with optional language tags - blank nodes Since non-literal RIF constants can be denoted by IRIs I believe the first case is covered with no modification to Core. This should be reviewed once the modified Core is stable. Our range of literal types covers the main xsd types used in RDF in an RDF compatible way so no change is needed there. However, RDF defines one extra literal type rdf:XMLLiteral whose lexical form is any is well-balanced canonical XML as defined in [3]. Jos' translator calls out the need for a sort/datatype for this. Mod #1: extend the set of datatypes in Core to include rdf:XMLLiteral. This would have no additional semantics and need have no associated builtin operations. From the point of view of a RIF translator only the lexical form (a string) need be handled. There is no requirement to generate or manipulate a DOM tree, for example. Though a RIF parser should validate the legal lexical form of the XMLLiteral. Of course, we could decide that XMLLiterals would be a useful thing in RIF, quite apart from RDF, one might want rules that explicitly process XML document fragments and support XPath or other operations on the XML fragments. I am NOT proposing this for Core in phase 1. As well as typed literals RDF supports the notion of plain literals. Most plain literals are just strings and are defined to be semantically equivalent to typed literals of type xsd:string. However, it is also possible to associate a language tag with a plain literal, for example to support multi-lingual labels. Core currently does not have a datatype capable of representing such labels. Mod #2: extend the set of datatypes in Core to include rif:text, a datatype whose value space is the set of pairs (text, langtag) where 'text' is any Unicode string and 'langtag' is as defined in RFC-3066. The lexical form for this datatype could be "text"@langtag. Jos' translation proposal also introduces the sort rdfs:Literal for plain literals. My counter proposal on this was that plain literals without language tags should be mapped to xsd:string [4]. I currently believe the latter is sufficient and there is no requirement for a plain literal datatype. Finally we come to bNodes. The translation proposal translates bNodes in input data (and in rule conclusions) into generated symbols, so called "rigid bNodes". In the translation proposal these generated symbols are iri's with a particular lexical form to be defined. In order to avoid having to reserve parts of IRI space and to facilitate useful builtins I would prefer to have one more datatype specifically to represent such rigid bNodes. Mod #3: extend the set of datatypes in Core to include rif:bNode, a datatype whose value space is unbounded (just as rif:iri) and whose lexical form is an arbitrary string "id"^^rif:bNode. ** Builtins As already mentioned [5]. In order to be able to express rules over RDF it would be useful to also have builtins which can recognize RDF constructs and for those constructs to be compatible with SPARQL. Specifically the proposed builtins are SPARQLs operators: isIRI isBlank isLiteral lang datatype langMatches and constructors for the new datatypes: text(Text, Lang) bNode(A, ... X) - gensym a new value of type rif:bNode ** Unrestricted frames In order to express rules over RDF we very commonly need to quantify over subjects and often over predicates. This means that in the RIF frame machinery: s[p->o] variables should be permitted in all of s, p and o positions. That is currently supported by the Core and requires no modification. However, I heard Gary at least say this may be a problem for him. So there is the possibility that the WG might want to restrict the syntax of frames to preclude this. If it did so it would have to either give up on useful RDF processing or the RDF translation would have to be redesigned, for example, to map to a rif:triple(s,p,o) predicate. ** Datasets In [6] Jos proposes being able to us RDF(S) graphs as background knowledge for RIF rules sets. So that a rule set would be able to indicate that one or more of its datasets followed the RDFDataModel. This would imply that such a dataset had been derived from an RDF source and converted to RIF facts via the tr algorithm. We *could* say that a RIF compliant processor should be able to perform this derivation process itself - parse an RDF/XML document and tr it. My understanding is that we are not*going to do that. That the only document format a Core-compliant processor is required to understand is RIF documents and the RIF specification will be neutral on whether it is the RIF processor, the data source or a third party translation service which converts any RDF data to RIF facts using the tr algorithm. Dave [1] http://lists.w3.org/Archives/Public/public-rif-wg/2007May/0077.html [2] I'm skipping over one subtlety here. In RDF, resources denoted by "lll"^^ddd where the string lll not a legal lexical form of datatype ddd is an "unknown" resource which is outside the set of literals, not a syntax error. Jos' translation handles this by translating ill-formed literals to artificial IRIs. [3] http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-XMLLiteral [4] Jos pointed out another subtlety here that the lexical spaces are slightly different. Xsd:string excludes unicode sequences like #fffe which aren't allowed in XML, whereas the legal lexical space of plain RDF literals is arbitrary unicode sequences. However, since the translator will normally be translating from an (RDF/)XML source then in fact only valid xsd:strings will be expressible in any case. So limiting the lexical space of RDF plain literals that can be processed by RIF to be just those expressible as xsd:strings seems entirely acceptable to me and imposes no practical restriction. [5] http://lists.w3.org/Archives/Public/public-rif-wg/2007Jun/0050.html [6] http://www.w3.org/2005/rules/wg/wiki/Arch/RDF -- Hewlett-Packard Limited Registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England
Received on Thursday, 5 July 2007 17:41:55 UTC