- From: Boris Motik <boris.motik@comlab.ox.ac.uk>
- Date: Sat, 15 Dec 2007 19:46:14 -0000
- To: "'Web Ontology Language \(\(OWL\)\) Working Group WG'" <public-owl-wg@w3.org>
Hello, At the F2F, there was a lengthy discussion about the typed vocabulary and rdf:type triples. I was asked to provide more detail about these problems, as well as to overview the current design decisions, so here it is. Typing in RDF parsing --------------------- OWL 1.1 DL implementations (such as Protégé and DL-based reasoners) typically work at the level of the structural specification, so they often need to convert an RDF graph into OWL 1.1 (DL) structural specification. Please note that I do not talk here about parsing RDF files into triples; rather, the "RDF Parsing" problem I talk about in this e-mail is the problem of transforming a set of RDF triples into objects of the OWL 1.1 structural specification. The main problem in RDF parsing is as follows. Assume that you encounter in an RDF graph G the following triples: (1) <a owl:someValuesFrom b> (2) <a owl:onProperty c> The way you translate this into the structural specification depends on the types of b and c: if b is a class and c is an object property, then you should create an instance of ObjectSomeValuesFrom; if b is a data range and c is a data property, then you should create an instance of DataSomeValuesFrom; otherwise, G does not represent a valid OWL (1.1 or 1.0) DL ontology. Note that these two triples by themselves do not specify the types of b and c. Hence, you need to find other triples in G to be able to process this fragment. There are several problems with this. 1. Streamed parsing =================== OWL 1.1 DL applications would like to implement RDF parsing in *streaming mode*: as triples arrive, the parser should transform the triples into the structural syntax, without keeping them in memory first. This is desirable in order to keep the memory consumption low: you don't need to store the triples *and* the structural objects in memory at the same time. 2. RDF parsing and imports ========================== Assume that, in the above example, the typing triples are included into some imported graph G' and not into G directly. This *hugely* complicates RDF parsing: when parsing G, one cannot work only in the triples from G, but needs to look at G' as well. Note that there is no requirement in OWL 1.1 that imports should not be cyclic. Hence, you can't really parse your files in sequence: you need to parse them "all at once". Admittedly, this problem can be technically solved: you need to make two passes through G and *all imported ontologies*: in the first pass you accumulate the typing triples, and in the second pass you actually generate the objects. I see, however, quite a few problems with this. a. This is inefficient: instead of going through each file only once, we now have to go through each file twice. It is unlikely that this will improve the image of Semantic Web tools w.r.t. performance. b. The problem is complicated if an RDF ontology imports an OWL 1.1 DL ontology in some other format. Now you need coordination among several parsers for different formats. c. All implementors at the F2F (i.e., Matthew Horridge, Michael Smith, and myself) unanimously agreed that doing this is a *major* pain. It is easy to dismiss one developer as a whiner; however, if three developers are complaining about this, we should probably take this seriously. Having an unnecessarily complex specification is likely going to lead to bugs and hassle for the users. Solutions --------- I would now like to highlight possible solutions to these problems. T1. Use typed vocabulary ======================== We might specify the types of entities directly in the triples. Hence, instead of (1) and (2), we could use triples (3) and (4): (3) <a owl:someValuesFrom b> (4) <a owl:onObjectProperty c> Now it is clear that c is a data property, so we do not need the typing triple. This solution is employed in OWL 1.1, but only if punning is used. Namely, if we are using c for both a data and an object property, then we cannot assign a single type of c. For entities having exactly one type, the OWL 1.1 specification does not use the typed vocabulary for backwards compatibility reasoners. Clearly, we should not immediately switch to the typed vocabulary, as this would wreak havoc on the existing ontologies and systems. However, going forward, we might want to keep the typed vocabulary and hope to deprecate the untyped triples in the future. T2. Declare types in the document where an entity is used ========================================================= Even if we stick with the typed vocabulary to allow for punning and in hope of a migration path, we need in OWL 1.1 a way to handle untyped statements such as (1) and (2). The developers of OWL 1.1 DL tools would be *quite happy* if we required the following: if an entity e is used in some RDF graph G in an axiom, then the graph G must contain an explicit typing triple for e (regardless of the imported ontologies). This would allow us to parse each RDF graph by itself, without taking into account the imported RDF graphs. I believe that this is actually compatible with OWL 1.0 DL. In particular, in the Semantics and Abstract Syntax document for OWL 1.0 (http://www.w3.org/TR/owl-semantics/mapping.html), Section 4.1 contains the following mapping: classID is mapped to rdf:type owl:Class . classID rdf:type rdfs:Class . [optional] Thus, if some classID occurs in some ontology O, then the translation of O into an RDF graph must contain the typing triple for classID. Now there is some confusion about what exactly this means. I always interpreted this as "if O is an ontology (not the imports closure, but just a single ontology), then its conversion into a RDF graph (a single graph which is actually a single file, regardless of the imports) must contain the typing triple". In other words, the translation is from one *ontology file* to one *RDF graph file*. Other people (notably Alan Ruttenberg) interpreted this as "Yes, the graph G needs to contain the typing triple; however, this triple can be included in some of the importing graphs". I asked Ian about this, and he said that this point was actually not specified precisely by the specification and that both interpretations might be OK. My proposal is to fix the OWL 1.0 specification to say that each typing triple should occur in the very RDF graph that is being parsed. T3. Allow typing triples in the imported ontologies =================================================== Alan Ruttenberg is advocating that we should not replicate typing triples in each RDF file that uses an entity, but should keep them in the RDF file where the entity is "declared". Well, the notion of "declared" is not quiet clear, but OWL 1.1 already provides for declaration axioms, so we might say that this means "declared in the OWL 1.1 sense". The reason why Alan advocates this solution is that he says there should be no propagation of information from the imported to the importing ontology. For example, you might have an ontology O' in which you have a property P which is declared as a data property. You import O' into O, and then you make some statements involving P. Then, you decide that P should be an annotation property: you can now just change the declaration in O'; since the typing triple for P is not repeated in O, everything works fine. If, in contrast, O was required to contain the typing triple for P as well, you'd need to change this triple as well to make everything work. I personally doubt that, apart from this rather simple scenario, things would "work just fine". If you change some entity in such a fundamental way that you change its type, you should probably go through all the ontologies that are using the changed entity and make sure that nothing broke. Therefore, I believe that the overhead in RDF parsing and the general complications involved in this solution are just not worth it. B. Put typing triples at the beginning of a file ================================================ This solution is orthogonal to T2 and T3; regardless of which solution we pick, we could additionally apply B. In order to increase the chances that we can parse ontologies in the streaming mode, we might include a note in the RDF serialization that implementations should preferably put the typing triples at the beginning of each document. Then, a clever implementation would usually have the typing information ready when it encounters the triples of the form (1) and (2). This is clearly just a hint; a complete OWL 1.1 implementation should allow for typing triples anywhere in the document. However, if the typing triples were indeed stored at the beginning of a document, then the implementation might selectively forget parts of the RDF graph as parsing proceeds. Regards, Boris
Received on Saturday, 15 December 2007 19:47:21 UTC