- From: Sergey Melnik <melnik@DB.Stanford.EDU>
- Date: Mon, 26 Aug 2002 06:50:20 -0700 (PDT)
- To: pat hayes <phayes@ai.uwf.edu>
- cc: Sergey Melnik <melnik@DB.Stanford.EDU>, w3c-rdfcore-wg@w3.org
On Tue, 20 Aug 2002, pat hayes wrote: > > >I'd like to restate the questions, which Jan raised recently, more explicitly. > > > >Much of the ongoing discussion about tidy/untidy literals amounts to > >arguing about different readings of a given piece of RDF/XML or > >NTriples syntax. From what I can tell, both tidy and untidy literals > >are implementable, so we have to pick one and wrap up. > > > >To my knowledge, untidy literals have been first suggested in the > >context of long range datatyping (aka implicit/global idiom). > >Specifically, untidy literals provide a shortcut for using a bNode > >with a property (two triples are essentially merged into one). > > I think it is rather more fundamental than this. Can you provide a technical argument for that? What makes an untidy literal to be "more" than a shortcut? > First, bear in mind > that this entire issue only arises in the context of RDF graph > syntax; all normal lexicalized notations, including XML, are > inherently untidy. RDF makes an explicit distinction between an abstract syntax and concrete syntaxes. Concrete syntaxes may be inherehtly untidy. RDF abstract syntax is, however, a data model. > I think that databases are usually untidy as well > (eg consider a RDB table where the second column is all integers and > the third column is all strings: we could have ten in one and "10" > in the other, and that would not be considered an implementation > issue, right?) There is a difference between defining a column type as integer and treating it as a social security number in applications. For a database management system, the values of this column are integers, not more and not less, with whatever semantics integers have. The relational model is damn clear about that. A couple of days ago I listened to a talk by Chris Date at VLDB'02 in Hong Kong (to a large extent, it's due to Chris that the relational model took off two decades ago). He talked about the foundations of the relational model, and about datatyping as well. Guess what: his presented datatyping just like suggested in the latest proposal. The gist is that the relational model is agnostic about the nature and internal structure of datatype values, which may just as well be XML documents and what not. Database systems have been doing perfectly well w/o untidy literals in the data model. > Second, the basic point is that people will, whether > we like it or not, tend to use things like numerals as names of > numbers. They are so used almost universally throughout human > discourse and all programming languages. It's fine to use all kinds of implicit notations in concrete syntaxes. RDF abstract syntax is a modeling language, it's about being explicit, or you are in trouble. If you have a numeral that is a name of a number, so why not model this explicitly? > However, we are committed to > RDF incorporating XML datatypes, which means that we cannot build > this assumption into the language, since the meaning of a numeral > string depends on the datatype applied to it (it *could* be a string. > ) So there is a 'natural' default that we are prevented from > assuming. Our options are to provide a different default (literals > are strings unless stated otherwise) which will make some > implementors happy but is unlikely to be found congenial by the rest > of the world; or, to provide a mechanism which treats 'bare' literals > as having a incomplete meaning and allows the datatyping information > to be supplied from elsewhere (range datatyping or some external > assumption). But the second requires that we allow one occurrence of > a bare literal to be associated with a different datatype than > another occurrence of the same literal. I agree that untidy literals are a neat way of doing long range datatyping. What I'm failing to see is that a) this use case alone is worth the trouble of introducing untidiness b) there is another, more compelling use case. As you illustrate above, using untidy literals makes us have to worry about incomplete information, defaults, and, even more importantly, it couples the RDF data model (abstract syntax) with a schema language that implementors might not want to support. In contrast, tidy stuff is orders of magnitude simpler (most of database folks I know will have hard times following your explanation). Keep in mind that 99% or more of all datatype use is covered by the XSD primitive types. Oracle, Microsoft, and IBM support (or are just about to support) XSD in their database products. The remaining one percent of non-trivial datatyping can be done easily using resources and bNodes. If fact, in RDF you'd probably not stretch datatyping to Addresses and Employees, but would use a schema language for defining such entity types anyway. > >Is this shortcut so fundamental that there is value of making it > >part of the spec? > > I think so, cf. above. In other words, its not just a shortcut. The technical argument is coming, I hope.... > But > even if it were, we have Mike Dean and Patrick S. insisting that > saving one triple per entry is critical for their applications (on > palm pilots and cell phones respectively, I note :-) Most apps will happily work with the primitive types. Most industrially deployed databases do, why won't cell phones? > BTW, isn't range datatyping exactly like having a datatype associated > with a table column in a database, rather than having to rewrite it > in every separate entry? And isn't that normal DB practice? Aha! In databases, as soon as you say insert data, a schema lookup is done and the values of the correct types are generated. It's just like creating the properly typed literals when parsing RDF/XML into an RDF graph (in the recent proposal). > >Is there an appealing use case for untidy literals that is not long > >range datatyping (aka implicit/global idiom)? > > >Are we closing off any important extensibility paths if we go for > >tidy literals? > > With regards to this last point, yes. DAML and OIL and probably OWL > will need the flexibility of allowing (semantically) untidy literals, DAML and OIL folks must be using untidy literals for *something*, right? What is their use case? Sergey > and if we forbid them then the DAML spec will need to be rewritten > and OWL will probably no longer base itself on RDF (or, an > alternative scenario, the Webont WG will split apart into two rival > groups which will produce incompatible standards. It is perilously > close to this already.) > > Pat
Received on Monday, 26 August 2002 09:52:32 UTC