- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Mon, 28 Jan 2002 12:51:54 +0200
- To: ext Sergey Melnik <melnik@db.stanford.edu>, RDF Core <w3c-rdfcore-wg@w3.org>
On 2002-01-25 19:22, "ext Sergey Melnik" <melnik@db.stanford.edu> wrote: > A major problem with TDL [1] is that it requires RDF graphs to be untidy > on literals. This requirement would breaks most RDF applications that I > am aware of. TDL itself doesn't, but I agree that the current TDL MT appears to. > To illustrate this point, let me give two following examples. > > Example 1: Querying and API access > ---------------------------------- > > Consider the following graph consisting of two RDF statements: > > _1 --dc:Title--> "The Origin of Species" > _2 --my:book--> "The Origin of Species" > > Existing applications that assume RDF graphs to be tidy on literals can > safely conclude that the two literals in the above graph are identical. > In other words, the query > > (X --dc:Title--> Z) & (Y --my:book--> Z) > > will succeed and return a variable substitution: > > {X=_1, Y=_2, Z="The Origin of Species"} But is your query based on nodes or labels? I would expect that it is based on labels -- in which case, it is irrelevant whether literals are tidy or untidy. In fact, you could execute such kinds of queries even on a completely untidy graph, since the scope of your query is a single triple and tidyness only applies to sets of triples, not individual triples. I don't see how TDL, either with tidy or untidy literals, would break your application. > In contrast, if literals are considered untidy, such conclusion cannot > be drawn safely without having access to the schemas that describe the > properties dc:Title and my:book. In fact, if the schema information for > dc:Title or my:book is missing, the two literals in the graph have to be > considered distinct. I'm sorry, but exactly what information is being provided by the schema? > In such case, one or both literals would be > "untyped", i.e. could potentially have a different interpretation, so > that their equality does not hold in all valid interpretations. That is true regardless. If there is no typing expressed in the RDF graph, then one must rely on the RDF-external application environment to provide the interpretation. > Consequently, the above query would (have to) fail and produce no > answer. I don't see that to be the case, if you are comparing labels rather than nodes. If you are comparing nodes, then I would wonder what the benefit of that is. If your graph is tidy, then you can be assurred that every string-unique literal is the label of one and only node -- but how does that have anything to do with the semantics of your query? > Similar issues arise for any kind of API access for RDF graphs. The > objects or data structures that represent literals in a programming > language cannot be safely compared without having type information > attached. I agree with you there. > In other words, the literals would have to carry along the > properties they are used with and/or the schema class(es) used as the > range of such properties. > > That is, developers would have to make literals complex objects. Ummm... well, you have three choices: 1. Define type locally. 2. Define type globally, by some schema. 3. Define type globally, by application environment. You seem to be taking the third choice, and then demanding that the interpretation imposed by the environment be supported by all logical queries on an RDF graph. I don't find that the least bit reasonable. If you wish to reliably interchange knowledge between disparate applications and environments, then all knowledge must be explicitly defined, either globally or locally, and thus while choice 3 may work for a single, tightly controlled application environment, it cannot be the basis for global interchange of knowledge. What you may assert some literal means for your application may not be known by some other application which attempts to interpret your knowledge (possibly unbeknownst to you) therefore typing external to RDF is inherently non-portable. And, yes, local typing does produce complex objects. What else would you expect? > Example 2: Storage > ------------------ > > Currently, the storage backends for RDF graphs can benefit substantially > from the fact that RDF graphs are tidy on literals. But are current storage backends presently based on tidy literal graphs? Though, I fully agree that there is significant benefit to be had in reduced graph real estate if literals are tidy. No argument there. And, again, TDL has no problems with tidy literals. > In other words, all > literals with the same textual content can be replaced by the same > integer ID, which is then stored as an element of an RDF statement in > the database. This feature facilitates compact storage of RDF graphs and > allows efficient query processing. Wrong. Sorry. Nope. The *only* benefit is a compression of graph nodes. If you have tidy literals, then queries *cannot* be based on node equality, only on label equality and such queries are within the context of a given interpretation. Thus, per my example illustration in http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0314.html the fact that both statements share the same object node with literal label "30" does *not* mean that e.g. Jenny has an age equal to the (foo:count,"30") or that Bob has a jersey number equal to (xsd:integer,"30"). In other words, tidy literal nodes does not mean that the node itself denotes a member of a value space of some datatype! It is only a string which may have meaning as a lexical form for an interpretation within the context of a given datatype. That is all. Thus, queries on literal tidy (or untidy) graphs should always be based on literal label comparison, not on node comparison. > In contrast, having untidy literals would imply in a general case that > each occurrence of a literal needs to be stored using a different > integer ID. As a consequence, the database size explodes, and the > queries become prohibitively expensive. I agree. And this is one argument in favor of tidy literals, but no problem for TDL. > Final remark > ------------ > > As a datatyping proposal, TDL introduces an original idiom for > representing datatypes that utilizes pairs of lexical tokens and data > values for representing typed data elements. Actually, a TDL pairings is a lexical form (literal) and datatype identity (URI) which unique denotes a datatype mapping, and the datatype mapping consists of a member of the lexical space and its corresponding member of the value space. Thus, reference to values are strictly within the MT, not the "fundamental" TDL model. > The document [1] shows how > this idiom can be deployed *without* requiring RDF graphs to be untidy > on literals, in a way consistent with the current model theory draft > [3]. The corresponding idiom in [1] is called Idiom P (or S-P). OK, I think you are seeing what I have outlined in http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0314.html that TDL itself is not incompatable with tidy literals. Also, neither of the TDL idioms state whether the graph must be tidy or untidy for literals, and in fact, like TDL, are agnostic about that. Patrick > -- Sergey > > [1] http://www-nrc.nokia.com/sw/TDL.html > [2] http://www-db.stanford.edu/~melnik/rdf/datatyping/ > [3] http://www.w3.org/TR/rdf-mt/ > > -- Patrick Stickler Phone: +358 50 483 9453 Senior Research Scientist Fax: +358 7180 35409 Nokia Research Center Email: patrick.stickler@nokia.com
Received on Monday, 28 January 2002 05:50:57 UTC