- From: Brian McBride <bwm@hplb.hpl.hp.com>
- Date: Tue, 24 Sep 2002 19:53:00 +0100
- To: RDF Core <w3c-rdfcore-wg@w3.org>
As I said earlier, with the aim of building stronger consensus around the tidy/untidy decision, I've been through the submissions last week in support of untidy literal semantics and I've tried to summarize my understanding of them here. There are some editorial comments intertwined. Feel free, to send suggestions/corrections. 1. Ease of Use for the Content Producer ---------------------------------------- Untidy semantics permit long range datatyping whilst maintaining monotonicity. Tidy semantics does not. With untidy semantics, a writer of RDF/XML can specify a range constraint on a property to imply the datatype of the value of the property. With tidy semantics, to maintain monotonicity, the datatype must be specified syntactically and the only way we currently have to do that is to put an rdf:datatype attribute on each property element that represents a datatyped value. [Ed note: This argument would be strengthened if we had specific examples where the burden of adding the datatype attributes was a problem, Owl for example? ] 2. Principle of Least Change ----------------------------- For the RDF Content Creator: Some RDF assumes string based semantics for literals. [There ought to be a reference here, perhaps something from RSS or FOAF?] Some RDF is written as if literals had value based semantics. [Reference CC/PP bitsPerPixel?] They both look the same in RDF/XML, e.g. <rdf:Description> <foo:bar>10</foo:bar> </rdf:Description> There is no way to tell from this alone whether the string "10" is meant or the integer 10. If a generic RDF processor is to know which, then it must be told somehow. With untidy semantics, this can be done with a range constraint in a schema to define the type of the literal. This is considerably more convenient for the content producer. With tidy semantics the content itself must be modified to indicate the actual datatype. For the content producer therefore, the least change is to have untidy literals and range constraints. [Ed Note: 1) This part of the argument is based on the desire to add datatype information to existing content so that the datatype is understood by an RDF processor. The least change is actually to do nothing, when the world is no worse off than it is now. But if it is thought that it is important to capture this datatype information in legacy content, then untidy semantics are the easiest path for the content producer. 2) As expressed, this argument does not address the tradeoff that exists. Whilst adopting untidy semantics makes it easier to specify that a literal really denotes an integer, it adds a burden of requiring the content producer to specify that strings were intended, where that would not be necessary with tidy semantics. It could be, depending on the distribution of data, that its more work to have to add the range constraints on all the literals that are really strings than it is to add the range constraints for other datatypes. We did ask the community which they preferred and they said untidy - see below. 3) This argument would carry greater force if: o we can establish there are a lot of examples where adding this datatype information is desirable o there are important examples where adding this datatype information is necessary (DAML+OIL?) ] For the generic tool developer, the principal of least change suggests doing what is currently done. The tools we currently know about (cwm, euler, redland and jena) implement tidy semantics. Overall, it is easier to change a few tools than a lot of content and some other specifications. 3. The user community asked for untidy --------------------------------------- We asked the user community, as clearly as we could, about the tradeoff between tidy and untidy semantics. There was a clear signal from those who responded, that indicated a preference for untidy semantics. 4. Abbreviated Syntax --------------------- [Ed note: I've added this one, as I was reminded of it whilst writing the above] With tidy semantics, it is not possible to represent datatyped values using property attributes in the abbreviated syntax. Thus it is not possible to embed in html, rdf which represents datatyped values without the browser rendering it.
Received on Tuesday, 24 September 2002 14:58:09 UTC