Re: Input sought on datatyping tradeoff

Brian McBride wrote:

>
> It is important in getting the semantics correct that we distinguish
> between a datatype value, e.g. the integer 10 and a lexical representation
> of the value, e.g. the string "10".

Yes, "10" = "10"

>
> We are proposing two principal idioms for representing datatyped
> information.  The first looks like this:
>
>    <Jenny> <age>          _:a .
>    _:a     <xsdr:decimal> "10" .
>
> This can be written in RDF/XML like this.
>
>    <rdf:Description rdf:about="Jenny">
>      <foo:age xsdr:decimal="10"/>
>    </rdf:Description>

right.

>
> Here the b-node _:a denotes the integer 10 which can be represented in
> decimal form as the string "10".


> We believe this idiom to be quite straightforward, but not sufficient on
> its own because it is common practise to write things like:
>
>    <jenny> <age> "10" .

The danger in interpreting this idiom in any way other than

age = "10"

is non-monotonicity. That is in the absence of _some other triples_ i.e. a
schema, the object of the age predicate is the literal string "10". Great
care needs to be taken that any other triples which affect this equality or
interpretation of the string, are either _always_ present or _never_
present/considered else non-monotonicity.

That is if I know:

<jenny> <age> "10"

no later information should change that fact or interpretation of that fact.


>
> A few simple test cases:
>
> Test A:
>
>    <Jenny> <ageInYears> "10" .
>    <John>  <ageInYears> "10" .
>
> Should an RDF processor conclude that the value of the ageInYears
> properties for Jenny and John are the same?

yes.

>
> There are variations on this test which should be considered before
answering.
>
> Test A2:
>
>    <Jenny> <ageInYears> "10" .
>    <Jenny> <testScore>  "10" .
>
> Should an RDF processor conclude that the value of Jenny's ageInYears
> property is the same as the value of Jenny's testScore property?

yes.

>
> Test A3:
>
>    <Jenny> <ageInYears>   "10" .
>    <Film>  <title>        "10" .

yes.

>
> Should an RDF processor conclude that the value of Jenny's age property is
> the same as the value of the Film's title property?  If the value the
> <ageInYears> property is an integer, and the value of the <title> property
> is a string, they are not the same thing and are thus not equal.

where has it been monotonically defined to be an integer vs. string? That is
the crux of the entire issue.

>
> The answer must be the same for all three of these A tests.

agreed.

>
> Now for a different kind of test.  How do the values of the two idioms
relate?
>
> Test D:
>
>    <Jenny>      <ageInYears> "10" .
>    <ageInYears> rdfs:range xsd:decimal .
>
>    <John>  <ageInYears>   _:a .
>    _:a     xsdr:decimal   "10" .
>
> Should an RDF processor conclude that Jenny and John have the same
> age?  [Note: in this example the range constraint is expressed using
> rdfs:range.  We may have to introduce a special datatyping range property,
> but that is an independent detail for now.]

this _so far_ looks ok i.e. "yes"

>
> It is not possible to have the answers to Tests A and Test D both be
> yes.  Either the A's can be yes or D can be yes, but not both.  We have to
> decide which of these is the most important to have.

why not? surely this is what the model theory is for, to _understand_ what
that the <rdfs:range> property has a magic meaning.

one could have two different types of equality -- string eq and value equal
(ala LISP).

>
>
> WHY THESE TEST CASES MATTER
> ===========================
>
> The formal semantics can define the meaning of a literal in one of two
> ways, given:
>
>    <Jenny> <ageInYears> "10" .
>
>    tidy) the <ageInYears> property takes a value which is a numeral, i.e.
a
> string
>
>    untidy) the <ageInYears> property takes a value which is some datatype
> value whose string  representation is "10", but without further
> information, such as
> a range constraint, we can't tell exactly what the value is, e.g. the
> string might be in octal.
>
> If we choose the tidy option, the object of the statement is always a
> string, which means that in:
>
>    <Jenny> <ageInYears> "10" .
>    <Film>  <title>      "10" .
>
> the values of the two properties are the same; they are both the STRING
"10".
>
> If we choose the untidy option, the value of the object of the statement
is
> unknown from this statement alone; a range constraint is required to
> determine the value from the literal string:
>
>    <jenny>      <ageInYears> "10" .
>    <ageInYears> <rdfs:range> <xsd:decimal> .
>
> With a range constraint, we can know that the object of the property is
the
> integer 10.

again, you have two different tests:

string-eq

value-equal

just distinguish between the two and let people/inferencing engines do what
they want

>
> CONCLUSION
> ==========
>
> To end then, please send a message to www-rdf-comments@w3.org (by 26 July
> 2002) indicating whether you believe its more important to have the answer
> to test cases A be yes, or test case D be yes:
>
>    Test A:
>
>    <Jenny> <ageInYears> "10" .
>    <John>  <ageInYears> "10" .

=> true (absolutely)

otherwise you fail the "duh!" test.

i'd like to say (functionally)

eq( ageInYears(Jenny) , ageInYears(John) )

>
> Test D:
>
>    <Jenny>      <ageInYears> "10" .
>    <ageInYears> <rdfs:range> <xsdr:decimal> .
>
>    <John>  <ageInYears>      _:a .
>    _:a     <xsdr:decimal>   "10" .
>
>

=> true (qualified)

i'd say:

value-equal( ageInYears(Jenny), ageInYears(John) )

note that "value-equal" might be non-monotonic if the <rdfs:range> propery
got detatched from the other triples -- but there is a danger of this type
of behavior almost every time we depend on more than one triple for an
inference!

Jonathan

Received on Friday, 12 July 2002 09:08:12 UTC