- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Wed, 30 Jan 2002 09:38:14 -0000
- To: <w3c-rdfcore-wg@w3.org>
[More on query also a substantial clarification of why I "can't live with" S-B. It encourages unsafe type processing within the application layer]. Lets look at the example query first: [[[ _:f <dc:Title> "10" . <mary> <age> "10" . Given a query: (?x <dc:Title> ?y) & (?z <age> ?y) existing applications will return: ?x = _:f, ?y = "10", ?z = <mary> ]]] RDF Query is of course, still an active research area, rather than one where there is any stable deployed code base. (There is deployed code, but it is in development). Hence, discussion about query semantics would perhaps be better placed on rdf-query, but ... Under S-B (the relevant idiom here) and RDF M&S, the only possible meaning of whether two literal nodes are equal is whether their labels are equal. Suppose RDF M&S and S-B are read untidily then each distinct triple with a literal as object has a distinct literal as object. There is no mechanism for indicating that two literals are the same or different except by their label. Since the query is asking us to compare two literal nodes, under S-B or RDF M&S there is only one possibility, compare their labels. Both the new (untidy) model theory and TDL suggest a second possibility, that of comparing the values associated with the literal nodes. Neither rule out the old possibility, they simply permit a new possibility. It is deceptive to suggest something is not backwardly compatible simply because it offers a better alternative, while allowing the old deprecated approach. === In terms of the syntactic transformation mapping TDL into S-P [1] the data in this example becomes: _:f <dc:Title> _:t . _:t <rdf:value> "10" . <mary> <age> _:a . _:a <rdf:value> "10" . (I use this transformation for explanatory purposes, I would not implement TDL like this). Now, both _:t and _:a can take possible values including < "10", "10" > (string reading) and < "10", 10 > (numeric reading). Hence the query as originally formulated is now false. However, if we wish to execute the query in a backwards compatibility mode, we would expect it to compare literal nodes using their string labels, rather than their values. Hence, the query, in this S-P style, would be equivalent to: [ (?x <dc:Title> ?t) & (?t <rdf:value> ?y) & (?z <age> ?a) & (?a <rdf:value> ?y) ] // checking literal label equality | [ (?x <dc:Title> ?y) & (?z <age> ?y) ] // checking non-literal node equality This could all happen in response to dispatching node equality method on the basis of the type of the nodes being compared, and on the presence or not of a backwards compatibility flag. ==== Now Patrick has argued that comparing on values is correct. Dan and Sergey argue that comparing on labels is correct (described by them using tidiness). TDL allows clarity about this distinction, and allows query researchers to explore both possibilities. ==== This framework allows me to illustrate an aspect of my "can't live" issue with S-B. S-B allows range constraints, in this example perhaps: <dc:title> <rdfs:range> <xsd:string.lex> . <age> <rdfs:range> <xsd:integer.lex> . I currently understand S-B as, within the RDF datatyping layer, insisting that "10" is a string. The two range constraints are used to: - constrain the set of possible strings - act as a hint to the application layer that: * type conversion is possible * type conversion is desirable. Thus given the database and the schema the application processing will correctly treat the title as a string, and the age as an integer. Good. Now, the query also operates in the application layer. This returns true. Thus in the application layer we have the following facts being the case: The film has the title "10". mary has age 10. The age of mary is the title of the film. i.e. that 10 is "10" There is a type clash here, and the combination is a logical error. Thus, S-B maintains a theoretical purity by pushing all typing problems into the application layer. Moreover it apparantly licenses the unwary application developer into during contradictory conclusions. So, S-B is seriously flawed in that it does not assist the application developer to avoid logical errors associated with datatyping. ===== Let's now explore this same error from the point of TDL. TDL requires a clarification of the query, are we speaking about the values, or are we speaking about the lexical forms. If we are talking about the values, then the query is only true if we have compatible type declarations for both literal nodes. Since we are not, we cannot conclude that The age of mary is the title of the film. hence we avoid the error. If on the other hand we are talking about the lexicalizations we have: The age of mary as a string is the title of the film as a string. i.e. 10 as a string is "10" as a string So TDL assists the application developer in being logically correct. ---- Brian, I would be very much obliged if you can condense this example to add to your summary list of concrete "can't live with" issues on the proposals. My title would be "S-B encourages logically errors in the application type processing." PS. Sergey attacked Patrick's list of "can't live with" issues as largely advocacy of TDL; I do tend to feel that an aspect of "can't live with" is comparative. Logical errors happen, no framework prevents all logical errors. Choosing a framework that is logically unsafe when an alternative safe framework is available is folly. However, modifications to S-B that assisted the application to make logically safe type inferences would be a substantial improvement that would address one of my major concerns. Jeremy [1]Jeremy Carroll: Re: Datatyping Summary http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0369.html
Received on Wednesday, 30 January 2002 04:38:52 UTC