- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Wed, 30 Jan 2002 09:42:01 +0200
- To: ext Libby Miller <Libby.Miller@bristol.ac.uk>, RDF Comments <www-rdf-comments@w3.org>
- CC: Jeremy Carroll <jjc@hplb.hpl.hp.com>, Brian McBride <bwm@hplb.hpl.hp.com>, ext Graham Klyne <Graham.Klyne@MIMEsweeper.com>
On 2002-01-29 21:45, "ext Libby Miller" <Libby.Miller@bristol.ac.uk> wrote: Thanks very much for your comments and examples, Libby. I found them very useful in further clarifying the issue regarding how literals and datatyping interact in queries on the RDF graph. Some comments/questions for you below... > In my experience, usually you don't care what the type of a node is; > sometimes you do, and then you can add the extra constraint. > > This doesn't work in TDL 'global idiom' when the datatyping is only > mentioned in rdfs:range. If it was in the database somewhere you could > have > > select ?x ?y ?z > where > (?x <dc:Title> ?y) > (?z <age> ?y) > > and the constraint on ?y from the range would be implicit and would > happen somewhere in the application code. Hmmm.... Why not just include the range constraints in the query? After all, it's knowledge that's in the graph. E.g. select ?x ?y ?z ?r where (?x <dc:Title> ?y) (?z <age> ?y) (<dc:Title> <rdfs:range> ?r) (<age> <rdfs:range> ?r) The range tests simply ensure that the two datatype contexts, for <age> and <dc:Title>, have some common intersection of type which would allow the literal to have a common value interpretation between them -- i.e. that ?y would be the same "thing" in both contexts. This presumes, of course, that you are basing your query on the values and not simply the string representation of their lexical forms. Otherwise, why would want an integer value and a string value to be considered the same thing? Are you then conducting queries on string labels in triples rather than the values they represent? Of course, I would expect that a query API would be based on an abstraction of the "raw" RDF graph, which takes datatype context into account, so that a query such as above would not be based on string comparison of literals, but on comparison of TDL pairings (lexical form + datatype). In which case, the range constraints in the query would be unnecessary, since the query engine would be trying to bind TDLs to ?y and not literal strings -- and thus two different values would not "accidentally" be bound to the same query variable. C.f. my example near the end of http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0365.html > In that case I'd be against this idom, because normally I don't rely on > having the schema (or schema-like constraints) available when you make > a query, and because I prefer to make explicit queries. This sounds more like an argument over using local versus global idioms. This issue would still remain in S, using the S-A global idiom. Even though S has tidy literal nodes, you still need the range knowledge in order to interpret values expressed in the S-A idiom. The tidy literals do not themselves denote anything but strings, which may or may not be lexical forms in some datatype context. Thus, you may think the results of your query, based only on the literal values, is correct, but may in fact be misleading as the literals have different interpretations and thus ?y would not correspond to the same value in each case, only to the same string. > Also I think that I would probably put the emphasis the opposite > way to the way Dan C suggests in the 'duh' argument - that is, in the > absence of typing info I'd make a match. Maybe that would be wrong. That's an interesting way to look at it. It's sort of like saying that, if you don't know what a literal's datatype context is, you can at least do string comparisons between literals. Though that only equates to reliable comparison of actual values if global uniqueness is imposed on literals from the application environment. > The TDL 'local idiom' looks alright to me. > I guess if there are many possible lexical representations of a given > literal, then that might make querying more fiddly. At the moment > tests like ?x > 5 are done by casting to Java datatypes, as Andy does > with RDQL as well, so matching different lexical representations is > avoided. Exactly, and that's what one would be expected to do. Since we cannot ensure that (a) all lexical forms are canonical, nor (b) that canonical lexical forms share all of the properties of the values they denote (sort order, etc.) therefore we must execute the mapping from lexical form to application-internalized value in order to compare most values. The goal of datatyping in RDF, as I see it, is to make sure that the information needed to execute that mapping is explicit, consistent, and independent of application context. Queries directly on the RDF graph (as opposed to some abstraction above the graph) which include literals will always have to take datatyping into account if consistently accurate results are to be obtained. Queries based solely on string comparison of literals will not be reliable in a context of syndication of arbitrary knowledge from many sources. Cheers, Patrick -- Patrick Stickler Phone: +358 50 483 9453 Senior Research Scientist Fax: +358 7180 35409 Nokia Research Center Email: patrick.stickler@nokia.com
Received on Wednesday, 30 January 2002 02:40:55 UTC