- From: Brian McBride <bwm@hplb.hpl.hp.com>
- Date: Thu, 11 Jul 2002 19:47:24 +0100
- To: www-rdf-logic@w3.org
The RDFCore WG is producing a proposal for how XML Schema datatypes should
be used in RDF. We would like some guidance on a particular tradeoff we
have to make.
The WG requests that you send your considered answers to
www-rdf-comment@w3.org. Please can we have all responses by 26th July
2002. Questions and discussion should take place on this list.
INTRODUCTION TO DATATYPES
=========================
Let's explain the basic ideas behind our approach to datatyping. The aim
is to define how datatype values, e.g. integers, dates etc should be
represented in RDF. We are building on the XML Schema datatypes
specification.
It is important in getting the semantics correct that we distinguish
between a datatype value, e.g. the integer 10 and a lexical representation
of the value, e.g. the string "10".
We are proposing two principal idioms for representing datatyped
information. The first looks like this:
<Jenny> <age> _:a .
_:a <xsdr:decimal> "10" .
This can be written in RDF/XML like this.
<rdf:Description rdf:about="Jenny">
<foo:age xsdr:decimal="10"/>
</rdf:Description>
Here the b-node _:a denotes the integer 10 which can be represented in
decimal form as the string "10".
This idiom treats an XML schema datatype as a mapping from a value to a
lexical representation of the value; this mapping is represented in RDF by
a property.
We believe this idiom to be quite straightforward, but not sufficient on
its own because it is common practise to write things like:
<jenny> <age> "10" .
where the author of this fragment of RDF means to represent the fact that
Jenny's age is the number 10. This is the second idiom, which is where we
need some guidance.
SOME TEST CASES
===============
It is here that we need some advice, because we have a choice to make in
the way we define the formal semantics.
A few simple test cases:
Test A:
<Jenny> <ageInYears> "10" .
<John> <ageInYears> "10" .
Should an RDF processor conclude that the value of the ageInYears
properties for Jenny and John are the same?
There are variations on this test which should be considered before answering.
Test A2:
<Jenny> <ageInYears> "10" .
<Jenny> <testScore> "10" .
Should an RDF processor conclude that the value of Jenny's ageInYears
property is the same as the value of Jenny's testScore property?
Test A3:
<Jenny> <ageInYears> "10" .
<Film> <title> "10" .
Should an RDF processor conclude that the value of Jenny's age property is
the same as the value of the Film's title property? If the value the
<ageInYears> property is an integer, and the value of the <title> property
is a string, they are not the same thing and are thus not equal.
The answer must be the same for all three of these A tests.
These test cases only relates to the situation where there are no range
constraints on the properties.
Now for a different kind of test. How do the values of the two idioms relate?
Test D:
<Jenny> <ageInYears> "10" .
<ageInYears> rdfs:range xsd:decimal .
<John> <ageInYears> _:a .
_:a xsdr:decimal "10" .
Should an RDF processor conclude that Jenny and John have the same
age? [Note: in this example the range constraint is expressed using
rdfs:range. We may have to introduce a special datatyping range property,
but that is an independent detail for now.]
It is not possible to have the answers to Tests A and Test D both be
yes. Either the A's can be yes or D can be yes, but not both. We have to
decide which of these is the most important to have.
WHY THESE TEST CASES MATTER
===========================
The formal semantics can define the meaning of a literal in one of two
ways, given:
<Jenny> <ageInYears> "10" .
tidy) the <ageInYears> property takes a value which is a numeral, i.e. a
string
untidy) the <ageInYears> property takes a value which is some datatype
value whose string representation is "10", but without further
information, such as
a range constraint, we can't tell exactly what the value is, e.g. the
string might be in octal.
If we choose the tidy option, the object of the statement is always a
string, which means that in:
<Jenny> <ageInYears> "10" .
<Film> <title> "10" .
the values of the two properties are the same; they are both the STRING "10".
If we choose the untidy option, the value of the object of the statement is
unknown from this statement alone; a range constraint is required to
determine the value from the literal string:
<jenny> <ageInYears> "10" .
<ageInYears> <rdfs:range> <xsd:decimal> .
With a range constraint, we can know that the object of the property is the
integer 10.
CONCLUSION
==========
To end then, please send a message to www-rdf-comments@w3.org (by 26 July
2002) indicating whether you believe its more important to have the answer
to test cases A be yes, or test case D be yes:
Test A:
<Jenny> <ageInYears> "10" .
<John> <ageInYears> "10" .
Test D:
<Jenny> <ageInYears> "10" .
<ageInYears> <rdfs:range> <xsdr:decimal> .
<John> <ageInYears> _:a .
_:a <xsdr:decimal> "10" .
We would also like to know the reasons for this preference.
Brian McBride
on behalf of the RDFCore WG
Received on Thursday, 11 July 2002 14:48:15 UTC