Re: Input sought on datatyping tradeoff from Drew McDermott on 2002-07-12 (www-rdf-logic@w3.org from July 2002)

From: Drew McDermott <drew.mcdermott@yale.edu>
Date: Fri, 12 Jul 2002 17:44:15 -0400 (EDT)
To: www-rdf-logic@w3.org
Message-Id: <200207122144.g6CLiFo06504@pantheon-po01.its.yale.edu>
   
   [Jonathan Borden]
   >That is if I know:
   >
   ><jenny> <age> "10"
   >
   >no later information should change that fact or interpretation of that fact.

   [Brian McBride]
   Yup.

   ....
   Because the A tests have no range constraint.  We either have to decide 
   that literals are self denoting - they always denote themselves in which 
   case the answer to D must be NO, or their denotation depends on a range 
   constraint in which case the answer to A must be NO.

It could be "I don't know."

   [Jonathan]
   >note that "value-equal" might be non-monotonic if the <rdfs:range> propery
   >got detatched from the other triples

   [Brian]
   that would not be non-monotonic - if you remove a triple then of course you 
   are free to remove some inferences that depend on it.  My understanding of 
   non monotonicity is that you must never withdraw an inference because of 
   adding new triples.

Don't you mean "monotonicity" in that last sentence?

It seems to me that Jonathan has a strong argument.  If the inferences
from a triple must stand when new triples are added, and if the answer
to Test A must be either Yes or No, then it can only be Yes.

I know it is futile to make the point at this late date, but the whole
farcical question stems from the fact that RDF (and XML, and SGML, the
whole ridiculous lineage) have no syntax for string literals.  If
everything is a string, then nothing is a string.  The problem could
be solved very simply if every literal found in an RDF file belonged to
at most one literal class (some apparent literals being ill-formed,
and hence not belonging to any).  That would require strings to be
indicated in some explicit way.  Hey, about quotes?

The test cases would all be handled thus:

   Test A:

      <Jenny> <ageInYears> "10" .
      <John>  <ageInYears> "10" .

   Should an RDF processor conclude that the value of the ageInYears 
   properties for Jenny and John are the same?
	
Yes, because "10" would be an integer, without ambiguity.

   There are variations on this test which should be considered before answering.

   Test A2:

      <Jenny> <ageInYears> "10" .
      <Jenny> <testScore>  "10" .

   Should an RDF processor conclude that the value of Jenny's ageInYears 
   property is the same as the value of Jenny's testScore property?

Yes, as before.

   Test A3:

      <Jenny> <ageInYears>   "10" .
      <Film>  <title>        "10" .

Yes, except that the second triple is illegal; the value of <title>
must be a string, and so the second triple should be

<Film> <title> "\"10\""
or 
<Film> <title> "'10'"

   Should an RDF processor conclude that the value of Jenny's age property is 
   the same as the value of the Film's title property?  If the value the 
   <ageInYears> property is an integer, and the value of the <title> property 
   is a string, they are not the same thing and are thus not equal.

The correct answer, No, is now obvious.

   The answer must be the same for all three of these A tests.

Only because of XML absurdity, which has carried over to N3.

   These test cases only relates to the situation where there are no range 
   constraints on the properties.

   Now for a different kind of test.  How do the values of the two idioms relate?

   Test D:

      <Jenny>      <ageInYears> "10" .
      <ageInYears> rdfs:range xsd:decimal .

      <John>  <ageInYears>   _:a .
      _:a     xsdr:decimal   "10" .

   Should an RDF processor conclude that Jenny and John have the same 
   age?  [Note: in this example the range constraint is expressed using 
   rdfs:range.  We may have to introduce a special datatyping range property, 
   but that is an independent detail for now.]

Yes, because even without the second triple we can tell that the third
element of the first triple is an integer.

   It is not possible to have the answers to Tests A and Test D both be 
   yes.  Either the A's can be yes or D can be yes, but not both.  We have to 
   decide which of these is the most important to have.

Well, good luck.

Note that the awkward quote-nesting devices, as in "'10'", can be avoided
if we just drop the outer layer of quotes, at least in cases where there
is no ambiguity about what lies between the property and value in a
triple, as there will not be 99.9% of the time.  This idea works for
all other programming/representation languages; why not for RDF?

The usual objection to this proposal is that someone would have to
decree for all time what the literal syntax should be, and RDF should
be more extensible than that.  This objection strikes me as too weak
to wreck an obviously good idea.  We can always fall back on the
syntax exemplified by Brian's Idiom 1:

   <Jenny> <age>          _:a .
   _:a     <cia:BabylonianEncryptedNumber> "'10'" .

to handle this case, where it's (once again) obvious that '10' is a
string, but only someone with the knowledge of how to parse it can
infer the value of _:a (although everyone will be able to infer that
_:a is a Number as soon as they see information about the range of <age>).

I'm sorry to indulge in a bit of sarcasm here and there, but the
persistent desire of the Web community to shoot itself in the foot
over this issue, nay, saw its foot off inch by bloody inch, baffles
me.

                                             -- Drew McDermott
Received on Friday, 12 July 2002 17:44:22 UTC