Re: datatyping and query in RDF from Libby Miller on 2002-01-30 (www-rdf-comments@w3.org from January to March 2002)

From: Libby Miller <Libby.Miller@bristol.ac.uk>
Date: Wed, 30 Jan 2002 11:51:29 +0000 (GMT)
To: Patrick Stickler <patrick.stickler@nokia.com>
cc: ext Libby Miller <Libby.Miller@bristol.ac.uk>, RDF Comments <www-rdf-comments@w3.org>, Jeremy Carroll <jjc@hplb.hpl.hp.com>, Brian McBride <bwm@hplb.hpl.hp.com>, ext Graham Klyne <Graham.Klyne@MIMEsweeper.com>
Message-ID: <Pine.GSO.4.44.0201301121310.29373-100000@mail.ilrt.bris.ac.uk>
On Wed, 30 Jan 2002, Patrick Stickler wrote:

> On 2002-01-29 21:45, "ext Libby Miller" <Libby.Miller@bristol.ac.uk> wrote:
>
> Thanks very much for your comments and examples, Libby. I found
> them very useful in further clarifying the issue regarding how
> literals and datatyping interact in queries on the RDF graph.


good, I'm glad

>
> Some comments/questions for you below...
>
>
> Hmmm....  Why not just include the range constraints in the
> query? After all, it's knowledge that's in the graph. E.g.
>
> select ?x ?y ?z ?r
> where
> (?x <dc:Title> ?y)
> (?z <age> ?y)
> (<dc:Title> <rdfs:range> ?r)
> (<age> <rdfs:range> ?r)

yeah, you could do that if I'd bothered to implement matching between
subqueries on non-variables, which I haven't. requires the presence of a
schema, which isn't my preference (it would fail without the schema).

>
> The range tests simply ensure that the two datatype
> contexts, for <age> and <dc:Title>, have some common
> intersection of type which would allow the literal to
> have a common value interpretation between them -- i.e.
> that ?y would be the same "thing" in both contexts.
>
> This presumes, of course, that you are basing your query
> on the values and not simply the string representation
> of their lexical forms. Otherwise, why would want an
> integer value and a string value to be considered the
> same thing? Are you then conducting queries on string
> labels in triples rather than the values they represent?
>


yes, of course I do. In most cases there is no datatyping information
available about a literal, and even if there were, I work over the top
of the triple-matching style api, so I wouldn't be able to get at that
information. Working over the api is a good thing, I reckon, because if
gives you some interoperability at the moment.

libby


> Of course, I would expect that a query API would be
> based on an abstraction of the "raw" RDF graph, which
> takes datatype context into account, so that a query
> such as above would not be based on string comparison
> of literals, but on comparison of TDL pairings
> (lexical form + datatype). In which case, the range
> constraints in the query would be unnecessary, since
> the query engine would be trying to bind TDLs to ?y
> and not literal strings -- and thus two different
> values would not "accidentally" be bound to the same
> query variable.
>
> C.f. my example near the end of
>
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0365.html
>
> > In that case I'd be against this idom, because normally I don't rely on
> > having the schema (or schema-like constraints) available when you make
> > a query, and because I prefer to make explicit queries.
>
> This sounds more like an argument over using local versus global
> idioms. This issue would still remain in S, using the S-A global
> idiom. Even though S has tidy literal nodes, you still need the
> range knowledge in order to interpret values expressed in the S-A
> idiom. The tidy literals do not themselves denote anything but
> strings, which may or may not be lexical forms in some datatype
> context.
>
> Thus, you may think the results of your query, based only on
> the literal values, is correct, but may in fact be misleading
> as the literals have different interpretations and thus ?y
> would not correspond to the same value in each case, only
> to the same string.
>
> > Also I think that I would probably put the emphasis the opposite
> > way to the way Dan C suggests in the 'duh' argument - that is, in the
> > absence of typing info I'd make a match. Maybe that would be wrong.
>
> That's an interesting way to look at it. It's sort of like saying
> that, if you don't know what a literal's datatype context is, you
> can at least do string comparisons between literals.
>
> Though that only equates to reliable comparison of actual values if
> global uniqueness is imposed on literals from the application
> environment.
>
> > The TDL 'local idiom' looks alright to me.
> > I guess if there are many possible lexical representations of a given
> > literal, then that might make querying more fiddly. At the moment
> > tests like ?x > 5 are done by casting to Java datatypes, as Andy does
> > with RDQL as well, so matching different lexical representations is
> > avoided.
>
> Exactly, and that's what one would be expected to do. Since we
> cannot ensure that
>
> (a) all lexical forms are canonical, nor
>
> (b) that canonical lexical forms share all of the properties
>     of the values they denote (sort order, etc.)
>
> therefore we must execute the mapping from lexical form to
> application-internalized value in order to compare most values.
>
> The goal of datatyping in RDF, as I see it, is to make sure that
> the information needed to execute that mapping is explicit, consistent,
> and independent of application context.
>
> Queries directly on the RDF graph (as opposed to some abstraction
> above the graph) which include literals will always have to take
> datatyping into account if consistently accurate results are to
> be obtained.
>
> Queries based solely on string comparison of literals will not
> be reliable in a context of syndication of arbitrary knowledge
> from many sources.
>
> Cheers,
>
> Patrick
>
> --
>
> Patrick Stickler              Phone: +358 50 483 9453
> Senior Research Scientist     Fax:   +358 7180 35409
> Nokia Research Center         Email: patrick.stickler@nokia.com
>
>
>
Received on Wednesday, 30 January 2002 06:53:06 UTC