- From: Sergey Melnik <melnik@db.stanford.edu>
- Date: Fri, 25 Jan 2002 09:22:32 -0800
- To: RDFCore WG <w3c-rdfcore-wg@w3.org>
A major problem with TDL [1] is that it requires RDF graphs to be untidy on literals. This requirement would breaks most RDF applications that I am aware of. To illustrate this point, let me give two following examples. Example 1: Querying and API access ---------------------------------- Consider the following graph consisting of two RDF statements: _1 --dc:Title--> "The Origin of Species" _2 --my:book--> "The Origin of Species" Existing applications that assume RDF graphs to be tidy on literals can safely conclude that the two literals in the above graph are identical. In other words, the query (X --dc:Title--> Z) & (Y --my:book--> Z) will succeed and return a variable substitution: {X=_1, Y=_2, Z="The Origin of Species"} In contrast, if literals are considered untidy, such conclusion cannot be drawn safely without having access to the schemas that describe the properties dc:Title and my:book. In fact, if the schema information for dc:Title or my:book is missing, the two literals in the graph have to be considered distinct. In such case, one or both literals would be "untyped", i.e. could potentially have a different interpretation, so that their equality does not hold in all valid interpretations. Consequently, the above query would (have to) fail and produce no answer. Similar issues arise for any kind of API access for RDF graphs. The objects or data structures that represent literals in a programming language cannot be safely compared without having type information attached. In other words, the literals would have to carry along the properties they are used with and/or the schema class(es) used as the range of such properties. That is, developers would have to make literals complex objects. Example 2: Storage ------------------ Currently, the storage backends for RDF graphs can benefit substantially from the fact that RDF graphs are tidy on literals. In other words, all literals with the same textual content can be replaced by the same integer ID, which is then stored as an element of an RDF statement in the database. This feature facilitates compact storage of RDF graphs and allows efficient query processing. In contrast, having untidy literals would imply in a general case that each occurrence of a literal needs to be stored using a different integer ID. As a consequence, the database size explodes, and the queries become prohibitively expensive. Final remark ------------ As a datatyping proposal, TDL introduces an original idiom for representing datatypes that utilizes pairs of lexical tokens and data values for representing typed data elements. The document [1] shows how this idiom can be deployed *without* requiring RDF graphs to be untidy on literals, in a way consistent with the current model theory draft [3]. The corresponding idiom in [1] is called Idiom P (or S-P). -- Sergey [1] http://www-nrc.nokia.com/sw/TDL.html [2] http://www-db.stanford.edu/~melnik/rdf/datatyping/ [3] http://www.w3.org/TR/rdf-mt/
Received on Friday, 25 January 2002 11:52:52 UTC