Re: Semantics of non-datatyped literals: Rationale (version 2)

Brian,

we should also clarify that no matter how we decide on tidiness

   <rdf:Description rdf:about="Jenny">
     <foo:age rdf:datatype="&xsd;integer">10</foo:age>
   </rdf:Description>
   <rdf:Description rdf:about="John">
     <foo:shoeSize rdf:datatype="&xsd;integer">10</foo:shoeSize>
   </rdf:Description>

entails

    <jenny> foo:age      _:l .
    <film>  foo:shoeSize _:l .

apparently meaning that Jenny's age is the same as John's shoe size. We 
cannot forbid the above entailment - why do we care about the one below? 
We owe the explanation to the developers such as Adobe folks who do not 
see the problem we are discussing, and do not care about the one or the 
other solution.

Sergey



Brian McBride wrote:

> 
> Updated in the light of comments received.  I should have put a change 
> log here.  Sorry folks, I'll try to remember that next time.
> 
> The Issue
> =========
> 
> Given the following RDF/XML
> 
>   <rdf:Description rdf:about="Jenny">
>     <foo:age>10</foo:age>
>   </rdf:Description>
>   <rdf:Description rdf:about="John">
>     <foo:age>10</foo:age>
>   </rdf:Description>
> 
> Do Jenny and John have the same age?  It may appear obvious that they 
> do.  But consider a similar example:
> 
>   <rdf:Description rdf:about="Jenny">
>     <foo:age>10</foo:age>
>   </rdf:Description>
>   <rdf:Description rdf:about="Film">
>     <foo:title>10</foo:title>
>   </rdf:Description>
> 
> Though the title of the film and the age of Jenny are both written as an 
> rdf literal "10", it can be argued that the meaning of the statement 
> about Jenny's age is that her age is the integer 10, and the meaning of 
> the statement about the title of the film is that it is the string of 
> characters "10".  If that is what they mean then the age of Jenny is not 
> the same as the title of the film.
> 
> The formal definition of this question is whether, given the second of 
> the RDF fragments above, entails (implies) the following (expressed in 
> n-triples):
> 
>   <jenny> foo:age   _:l .
>   <film>  foo:title _:l .
> 
> This is called the tidy entailment.  We say that we are using string 
> based (or tidy) semantics if this entailment holds.  String based 
> semantics has a number of implications.  Given:
> 
>   <rdf:Description rdf:about="Jenny">
>     <foo:age>10</foo:age>
>   </rdf:Description>
>   <rdf:Description rdf:about="John">
>     <foo:age rdf:datatype="&xsd;integer">10</foo:age>
>   </rdf:Description>
> 
> Jenny and John do not have the same age, i.e. the above also does not 
> entail:
> 
>   <jenny> foo:age   _:l .
>   <john>  foo:age   _:l .
> 
> The result is actually stronger.  Jenny's age is definitely not equal to 
> John's age, so for example:
> 
>  <rdf:Description rdf:about="Jenny">
>     <foo:age>10</foo:age>
>   </rdf:Description>
>   <rdf:Description rdf:about="John">
>     <foo:age rdf:datatype="&xsd;integer">10</foo:age>
>   </rdf:Description>
>   <rdf:Description rdf:about="&foo;age">
>     <rdfs:range rdf:resource="&xsd;integer"/>
>   </rdf:Description>
> 
> is inconsistent because the value of the age property should be an 
> integer and it isn't.  (Here we are ruling out the possibility that the 
> range constraint be interpreted as a constraint on the lexical form of 
> the literal, in the interests of simplicity.)
> 
> If the tidy entailment does not hold, then, given:
> 
>   <rdf:Description rdf:about="Jenny">
>     <foo:age>10</foo:age>
>   </rdf:Description>
>   <rdf:Description rdf:about="John">
>     <foo:age rdf:datatype="&xsd;integer">10</foo:age>
>   </rdf:Description>
>   <rdf:Description rdf:about="&foo;age">
>     <rdfs:range rdf:resource="&xsd;integer"/>
>   </rdf:Description>
> 
> Jenny and John do have the same age, i.e. this does entail:
> 
>   <jenny> foo:age   _:l .
>   <john>  foo:age   _:l .
> 
> 
> Desiderata
> ==========
> 
> This is an issue that requires a judgement between different options.  
> The WG has not found a solution to satisfy all desirable features. 
> Rather than trying to state precise requirements it is better to define 
> the considerations that bear on that judgement.
> 
> General
> -------
> 
> GD01: Interoperability [Ed note:  I've added that one in the light 
> Patrick's proposal to say nothing]
> 
> 
> Use
> ---
> 
> UD01: Verbosity of expressing datatyped literals in the RDF/XML syntax.  
> [Note eventually rdf/xml will be written by tools, but for 
> bootstrapping, users see it]
> 
>   Favours value based semantics.
> 
> UD02: preferable that CC/PP schema should not have to change
> 
>   Favours value based semantics
> 
> UD03: Support from RDF customers including daml/webont, rss, cc/pp, 
> dublin core, Adobe XMP, DMOZ, mozilla, Redland and Jena.
> 
>   Unknown:  Action required
> 
> UD04: the ability to put, in a schema, constraints on the *lexical form* 
> of values (gravy, not requirement)
> 
>   I'm not sure about this one.
> 
> UD05: It should be easy to update legacy data with datatype information
> 
>   Favours value based semantics
> 
> UD06: Must be able to merge duplicate statements with the same literal 
> value as object (when there is an applicable range constraint and when 
> there is not)
> 
>   A wash
> 
> UD07: Minimize the number of nodes and arcs to represent a datatype 
> value (scalability)
> 
>   A wash
> 
> UD08: Support xml schema datatypes
> 
>   A wash
> 
> UD09: Need mechanism to enable queries based on datatype values (before, 
> after, during re dates; lessthan, gtr than)
> 
>   A wash
> 
> UD10: Global type inference
> 
>   dropped - the needs are covered by UD01 and UD05
> 
> UD11: Backward compatibility for existing data and specs (dc, cc/pp, rss)
> 
>   cc/pp favours value based semantics
>   dc and rss are neutral
> 
> UD12: Capture as much of the information as possible in the RDF (e.g. if 
> its an integer, RDF should know its an integer)
>   Not sure how to call this one.  Either its a wash because both 
> approaches are equally expressive, or this favours value based semantics 
> because its easier to upgrade existing data.  Lets call it:
> 
>   favours value based semantics
> 
> UD13: It should be easy to explain to users.
> 
>   According to Frank - this is a wash (Please confirm Frank)
> 
> UD14: No incompleteness in expressivity
> 
>   Marginally favours string based semantics as the current proposal for 
> value based semantics does not allow reification to be done exactly.
> 
> 
> Implementation
> --------------
> 
> ID01: Minimize burden on implementors
> 
>   Favours string based semantics
> 
> ID02: Monotonic and sound model theory.  Complete inference is not 
> required.
> 
>   A wash
> 
> ID03: Convincing evidence that the solution is implementable
> 
>   Favours string based semantics
> 
> ID04: Do not require implementations to maintain a hash table of literals.
> 
>   A wash - Mike - are you convinced?
> 
> UD05: not have to keep track of each different occurrence of some literal
> 
>   A wash
> 
> UD06: Backward compatibility: existing implementations and applications 
> should be able to upgrade in a backward compatible way
> 
>   comments from the implementors please.
> 
> 
> Process
> -------
> 
> PD01: Speed - the WG needs to finish soon
> 
> Editors:  what effect will the decision have on you schedules
> Implementors:  please can you report on what effect the different 
> choices might have on your having implementations ready during candidate 
> rec phase
> 
> 
> Other
> -----
> 
> OD01: an endorsement of the practice of using datatype properties, i.e. 
> [ xsdt:date "2002-09-23"]. (gravy, not a requirement)
> 
> I believe the WG agreed to drop datatype properties this time round, in 
> the interests of time and simplicity.  Its also a wash between the two 
> approaches.
> 
> 
> 

Received on Wednesday, 2 October 2002 11:57:25 UTC