- From: Brian McBride <bwm@hplb.hpl.hp.com>
- Date: Tue, 24 Sep 2002 19:53:00 +0100
- To: RDF Core <w3c-rdfcore-wg@w3.org>
As I said earlier, with the aim of building stronger consensus around the
tidy/untidy decision, I've been through the submissions last week in
support of untidy literal semantics and I've tried to summarize my
understanding of them here. There are some editorial comments
intertwined. Feel free, to send suggestions/corrections.
1. Ease of Use for the Content Producer
----------------------------------------
Untidy semantics permit long range datatyping whilst maintaining
monotonicity. Tidy semantics does not.
With untidy semantics, a writer of RDF/XML can specify a range constraint
on a property to imply the datatype of the value of the property.
With tidy semantics, to maintain monotonicity, the datatype must be
specified syntactically and the only way we currently have to do that is to
put an rdf:datatype attribute on each property element that represents a
datatyped value.
[Ed note: This argument would be strengthened if we had specific examples
where the burden of adding the datatype attributes was a problem, Owl for
example?
]
2. Principle of Least Change
-----------------------------
For the RDF Content Creator:
Some RDF assumes string based semantics for literals. [There ought to be a
reference here, perhaps something from RSS or FOAF?] Some RDF is written
as if literals had value based semantics. [Reference CC/PP bitsPerPixel?]
They both look the same in RDF/XML, e.g.
<rdf:Description>
<foo:bar>10</foo:bar>
</rdf:Description>
There is no way to tell from this alone whether the string "10" is meant or
the integer 10.
If a generic RDF processor is to know which, then it must be told
somehow. With untidy semantics, this can be done with a range constraint
in a schema to define the type of the literal. This is considerably more
convenient for the content producer. With tidy semantics the content
itself must be modified to indicate the actual datatype.
For the content producer therefore, the least change is to have untidy
literals and range constraints.
[Ed Note:
1) This part of the argument is based on the desire to add datatype
information to existing content so that the datatype is understood by an
RDF processor. The least change is actually to do nothing, when the world
is no worse off than it is now. But if it is thought that it is important
to capture this datatype information in legacy content, then untidy
semantics are the easiest path for the content producer.
2) As expressed, this argument does not address the tradeoff that
exists. Whilst adopting untidy semantics makes it easier to specify that a
literal really denotes an integer, it adds a burden of requiring the
content producer to specify that strings were intended, where that would
not be necessary with tidy semantics. It could be, depending on the
distribution of data, that its more work to have to add the range
constraints on all the literals that are really strings than it is to add
the range constraints for other datatypes. We did ask the community which
they preferred and they said untidy - see below.
3) This argument would carry greater force if:
o we can establish there are a lot of examples where adding this
datatype information is desirable
o there are important examples where adding this datatype information is
necessary (DAML+OIL?)
]
For the generic tool developer, the principal of least change suggests
doing what is currently done. The tools we currently know about (cwm,
euler, redland and jena) implement tidy semantics.
Overall, it is easier to change a few tools than a lot of content and some
other specifications.
3. The user community asked for untidy
---------------------------------------
We asked the user community, as clearly as we could, about the tradeoff
between tidy and untidy semantics. There was a clear signal from those who
responded, that indicated a preference for untidy semantics.
4. Abbreviated Syntax
---------------------
[Ed note: I've added this one, as I was reminded of it whilst writing the
above]
With tidy semantics, it is not possible to represent datatyped values using
property attributes in the abbreviated syntax. Thus it is not possible to
embed in html, rdf which represents datatyped values without the browser
rendering it.
Received on Tuesday, 24 September 2002 14:58:09 UTC