- From: Libby Miller <Libby.Miller@bristol.ac.uk>
- Date: Wed, 30 Jan 2002 11:51:29 +0000 (GMT)
- To: Patrick Stickler <patrick.stickler@nokia.com>
- cc: ext Libby Miller <Libby.Miller@bristol.ac.uk>, RDF Comments <www-rdf-comments@w3.org>, Jeremy Carroll <jjc@hplb.hpl.hp.com>, Brian McBride <bwm@hplb.hpl.hp.com>, ext Graham Klyne <Graham.Klyne@MIMEsweeper.com>
On Wed, 30 Jan 2002, Patrick Stickler wrote: > On 2002-01-29 21:45, "ext Libby Miller" <Libby.Miller@bristol.ac.uk> wrote: > > Thanks very much for your comments and examples, Libby. I found > them very useful in further clarifying the issue regarding how > literals and datatyping interact in queries on the RDF graph. good, I'm glad > > Some comments/questions for you below... > > > Hmmm.... Why not just include the range constraints in the > query? After all, it's knowledge that's in the graph. E.g. > > select ?x ?y ?z ?r > where > (?x <dc:Title> ?y) > (?z <age> ?y) > (<dc:Title> <rdfs:range> ?r) > (<age> <rdfs:range> ?r) yeah, you could do that if I'd bothered to implement matching between subqueries on non-variables, which I haven't. requires the presence of a schema, which isn't my preference (it would fail without the schema). > > The range tests simply ensure that the two datatype > contexts, for <age> and <dc:Title>, have some common > intersection of type which would allow the literal to > have a common value interpretation between them -- i.e. > that ?y would be the same "thing" in both contexts. > > This presumes, of course, that you are basing your query > on the values and not simply the string representation > of their lexical forms. Otherwise, why would want an > integer value and a string value to be considered the > same thing? Are you then conducting queries on string > labels in triples rather than the values they represent? > yes, of course I do. In most cases there is no datatyping information available about a literal, and even if there were, I work over the top of the triple-matching style api, so I wouldn't be able to get at that information. Working over the api is a good thing, I reckon, because if gives you some interoperability at the moment. libby > Of course, I would expect that a query API would be > based on an abstraction of the "raw" RDF graph, which > takes datatype context into account, so that a query > such as above would not be based on string comparison > of literals, but on comparison of TDL pairings > (lexical form + datatype). In which case, the range > constraints in the query would be unnecessary, since > the query engine would be trying to bind TDLs to ?y > and not literal strings -- and thus two different > values would not "accidentally" be bound to the same > query variable. > > C.f. my example near the end of > > http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0365.html > > > In that case I'd be against this idom, because normally I don't rely on > > having the schema (or schema-like constraints) available when you make > > a query, and because I prefer to make explicit queries. > > This sounds more like an argument over using local versus global > idioms. This issue would still remain in S, using the S-A global > idiom. Even though S has tidy literal nodes, you still need the > range knowledge in order to interpret values expressed in the S-A > idiom. The tidy literals do not themselves denote anything but > strings, which may or may not be lexical forms in some datatype > context. > > Thus, you may think the results of your query, based only on > the literal values, is correct, but may in fact be misleading > as the literals have different interpretations and thus ?y > would not correspond to the same value in each case, only > to the same string. > > > Also I think that I would probably put the emphasis the opposite > > way to the way Dan C suggests in the 'duh' argument - that is, in the > > absence of typing info I'd make a match. Maybe that would be wrong. > > That's an interesting way to look at it. It's sort of like saying > that, if you don't know what a literal's datatype context is, you > can at least do string comparisons between literals. > > Though that only equates to reliable comparison of actual values if > global uniqueness is imposed on literals from the application > environment. > > > The TDL 'local idiom' looks alright to me. > > I guess if there are many possible lexical representations of a given > > literal, then that might make querying more fiddly. At the moment > > tests like ?x > 5 are done by casting to Java datatypes, as Andy does > > with RDQL as well, so matching different lexical representations is > > avoided. > > Exactly, and that's what one would be expected to do. Since we > cannot ensure that > > (a) all lexical forms are canonical, nor > > (b) that canonical lexical forms share all of the properties > of the values they denote (sort order, etc.) > > therefore we must execute the mapping from lexical form to > application-internalized value in order to compare most values. > > The goal of datatyping in RDF, as I see it, is to make sure that > the information needed to execute that mapping is explicit, consistent, > and independent of application context. > > Queries directly on the RDF graph (as opposed to some abstraction > above the graph) which include literals will always have to take > datatyping into account if consistently accurate results are to > be obtained. > > Queries based solely on string comparison of literals will not > be reliable in a context of syndication of arbitrary knowledge > from many sources. > > Cheers, > > Patrick > > -- > > Patrick Stickler Phone: +358 50 483 9453 > Senior Research Scientist Fax: +358 7180 35409 > Nokia Research Center Email: patrick.stickler@nokia.com > > >
Received on Wednesday, 30 January 2002 06:53:06 UTC