Re: ISSUE: Datatypes

>From: Dan Connolly <connolly@w3.org>
>Subject: Re: ISSUE: Datatypes
>Date: 16 May 2002 17:02:12 -0500
>
>>  On Thu, 2002-05-16 at 13:29, Peter F. Patel-Schneider wrote:
>>  > TITLE:       Datatypes
>>  > DESCRIPTION: It appears that the RDF Core WG will not produce a 
>>solution to
>>  >	     the datatypes issue, as witness the lack thereof in the
>>  >	     current working drafts and the imminent end of the WG.
>>
>>  I don't dispute there's an issue here, but the RDF Core WG
>>  still plans to produce a treatment of datatypes, as far as I know.
>
>OK, I'm willing to revise this to
>
>DESCRIPTION: It appears that the RDF Core WG will not produce a solution to
>	     the datatypes issue in the near future.
>
>>  Re "imminent end of the WG", they've requested an extension thru
>>  something like September 2002.
>
>I hadn't heard about any extension.
>
>>  I'm not sure how long WebOnt should hold its breath, but I still
>>  expect RDF Core to provide a dataypes solution. I think.

If anything, the problem is that we have several of them, and we 
can't decide which one is best.

>Well, it is going to be hard to make semantic progress for the next quite a
>while if we wait for the RDF datatypes solution.  Perhaps the best way to
>proceed would be to put together our own solution and when and if RDF Core
>has a solution harmonize with it.

This issue has produced more contention and internal debate than any 
other. Some time ago the WG decided to produce a datatyping proposal 
based on a 'stake in the ground' draft, but that decision has been 
re-opened and is now once again under active discussion.

The stake in the ground proposal can be summarized as follows.
1. Literal nodes always denote strings, under all circumstances.
2. To refer to the value of a literal under a datatype mapping, a 
bnode is used to denote the value, eg
<ex:Jenny> <ex:age> _:xxx .
_:xxx <xsd:integer> "10" .
asserts that Jenny's age is ten.
3. The bnode can be linked to the literal by the datatype name used 
as a property, as above (the relevant semantic constraint is that a 
datatype property must denote the inverse of the datatype 
lexical-to-value mapping)
4. Or, it can be linked by a special property provisionally called 
rdfd:lex which does not constrain the value but which has a special 
relationship to the 'datatype range' of a property, as described 
below.
5. Datype ranges are defined using rdfd:range (name may be altered) 
which is distinct from rdfs:range. The semantic rules are that

ppp rdfd:range ddd .
aaa ppp _:xxx .
_:xxx rdfd:lex "lll" .

together can only be satisfied if L2V(I(ddd))(lll)=I(_:xxx)  , which 
amounts to the same as writing the datatype name in place of the 
rdfd:lex; and that

ppp rdfd:range ddd .
aaa ppp "lll" .

can only be satisfied if L2V(I(ddd))(lll) is defined, ie the literal 
in this case must be in the lexical space of the datatype.

Since rdfd:range and rdfs:range are independent, the datatyping can 
be invoked independently of the actual range of the property. This 
rather baroque set of conditions is designed to accommodate all the 
use cases that various RDF communities seem to require, including 
allowing a property to have values which mix the lexical and value 
spaces of a datatype, or which are agnostic about whether their 
values should be considered to be lexical items or values.

Various combinations of rdfd:range and rdfs:range can be used to 
impose various more or less 'tight' constraints connecting property 
ranges with datatype lexical and value domains, but the basic idea is 
that the datatype check is functionally independent of the actual 
range. This avoids the difficulties arising from range inheritance 
clashing with datatype mappings.

The chief point of controversy arising from this proposal concerns 
the 'in-line' idiom where a literal is used directly as a property 
value, eg

<ex:Jenny> <ex:age> "10" .

which on this proposal is unambiguously an assertion that the value 
of the property is a character string, no matter what datatype is 
associated with the property ex:age (in contrast to
<ex:Jenny> <ex:age> _:xxx .
_:xxx <rdfd:lex> "10" .
which can be an assertion that Jenny's age is the numbers ten, two, 
eight or sixteen or a string, depending on what rdfd:range is 
asserted of the property.)

This rigid interpretation of the in-line idiom, which in effect 
forbids the use of literals as names in the usual sense, is felt by 
many to be unnatural. A majority of the WG at present seem to feel 
that the unnaturalness is a price worth paying for the resulting 
simplicity of the language, but a vocal minority disagree.

-----

The chief costs of allowing the inline idiom to be influenced by 
datatyping are:
1. when no datatyping information is present such a triple will be 
ambiguous, which many feel is inappropriate; in particular, it would 
make break pieces of legacy code which treat literals as unique names;
2. the RDF graph syntax will have to treat all literals as distinct, 
since they might get datatyped differently by further rdfd:range 
assertions. The resulting 'untidy graph' syntax has proven unwieldy 
and unintuitive.
3. the model theory becomes more complex and subtle, and therefore 
harder to follow.

The benefits that are claimed are that this usage is more natural, 
and that the while the language itself will be more complicated (in 
syntax and semantics), it will be no more complex in practice since 
there would be no need to use the more complex idioms involving 
bnodes.

----

None of the above should be taken to be the opinion or determination 
of the WG. It is my summary of the current situation and chief 
issues, offered solely to suggest to Webont where the current WG 
thinking is at, to improve the bandwidth between RDFCoreWG and Webont.

Feedback would be welcome, by the way.

Pat Hayes

Received on Monday, 20 May 2002 10:54:54 UTC